retrieve datefield value from document
I have an index first built with solr1.4 and later upgraded to solr3.6, which has 150million documents, and all docs have a datefield which are not blank. (verified by solr query). I am using the following code snippet to retrieve import org.apache.lucene.index.IndexReader; import org.apache.lucene.store.*; import org.apache.lucene.document.*; IndexReader input = IndexReader.open(indexDir); Document d = input.document(i); int maxDoc = input.maxDoc(); for (int i = 0; i maxDoc; i++) { System.out.println(d.get('date'); } However, about 100 million docs give null for d.get('date') and about other 50 million docs give the right values. What could be wrong? Ming-
Re: retrieve datefield value from document
Shot in the dark: You're using Lucene to read the index. That's sort of circumventing all the typing stuff that Solr does. Solr can deal with an index where some of the segments are in one format (say 1.4) and others are in another (3.6). Maybe they're being stored in a format in the newer (or older) segments that doesn't work with raw retrieval of the values through Lucene in the same way. Maybe it's able to retrieve the stored value from the indexed representation in one case rather than needing to store it. I'd query your index using EmbeddedSolrServer instead and see if that changes what you see. Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions w: appinions.com http://www.appinions.com/ On Fri, Jun 14, 2013 at 4:15 PM, Mingfeng Yang mfy...@wisewindow.comwrote: I have an index first built with solr1.4 and later upgraded to solr3.6, which has 150million documents, and all docs have a datefield which are not blank. (verified by solr query). I am using the following code snippet to retrieve import org.apache.lucene.index.IndexReader; import org.apache.lucene.store.*; import org.apache.lucene.document.*; IndexReader input = IndexReader.open(indexDir); Document d = input.document(i); int maxDoc = input.maxDoc(); for (int i = 0; i maxDoc; i++) { System.out.println(d.get('date'); } However, about 100 million docs give null for d.get('date') and about other 50 million docs give the right values. What could be wrong? Ming-
Re: retrieve datefield value from document
Maybe a document was marked as deleted? *isDeletedhttp://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/index/IndexReader.html#isDeleted(int) * On Fri, Jun 14, 2013 at 11:25 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Shot in the dark: You're using Lucene to read the index. That's sort of circumventing all the typing stuff that Solr does. Solr can deal with an index where some of the segments are in one format (say 1.4) and others are in another (3.6). Maybe they're being stored in a format in the newer (or older) segments that doesn't work with raw retrieval of the values through Lucene in the same way. Maybe it's able to retrieve the stored value from the indexed representation in one case rather than needing to store it. I'd query your index using EmbeddedSolrServer instead and see if that changes what you see. Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions w: appinions.com http://www.appinions.com/ On Fri, Jun 14, 2013 at 4:15 PM, Mingfeng Yang mfy...@wisewindow.com wrote: I have an index first built with solr1.4 and later upgraded to solr3.6, which has 150million documents, and all docs have a datefield which are not blank. (verified by solr query). I am using the following code snippet to retrieve import org.apache.lucene.index.IndexReader; import org.apache.lucene.store.*; import org.apache.lucene.document.*; IndexReader input = IndexReader.open(indexDir); Document d = input.document(i); int maxDoc = input.maxDoc(); for (int i = 0; i maxDoc; i++) { System.out.println(d.get('date'); } However, about 100 million docs give null for d.get('date') and about other 50 million docs give the right values. What could be wrong? Ming-
Re: retrieve datefield value from document
Michael, That's what I thought as well. I would assume an optimization of the index would rewrite all documents in the newer format then? Ming- On Fri, Jun 14, 2013 at 1:25 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Shot in the dark: You're using Lucene to read the index. That's sort of circumventing all the typing stuff that Solr does. Solr can deal with an index where some of the segments are in one format (say 1.4) and others are in another (3.6). Maybe they're being stored in a format in the newer (or older) segments that doesn't work with raw retrieval of the values through Lucene in the same way. Maybe it's able to retrieve the stored value from the indexed representation in one case rather than needing to store it. I'd query your index using EmbeddedSolrServer instead and see if that changes what you see. Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions w: appinions.com http://www.appinions.com/ On Fri, Jun 14, 2013 at 4:15 PM, Mingfeng Yang mfy...@wisewindow.com wrote: I have an index first built with solr1.4 and later upgraded to solr3.6, which has 150million documents, and all docs have a datefield which are not blank. (verified by solr query). I am using the following code snippet to retrieve import org.apache.lucene.index.IndexReader; import org.apache.lucene.store.*; import org.apache.lucene.document.*; IndexReader input = IndexReader.open(indexDir); Document d = input.document(i); int maxDoc = input.maxDoc(); for (int i = 0; i maxDoc; i++) { System.out.println(d.get('date'); } However, about 100 million docs give null for d.get('date') and about other 50 million docs give the right values. What could be wrong? Ming-
Re: retrieve datefield value from document
HI Dmitry, No, the docs are not deleted. Ming- On Fri, Jun 14, 2013 at 1:31 PM, Dmitry Kan solrexp...@gmail.com wrote: Maybe a document was marked as deleted? *isDeleted http://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/index/IndexReader.html#isDeleted(int) * On Fri, Jun 14, 2013 at 11:25 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Shot in the dark: You're using Lucene to read the index. That's sort of circumventing all the typing stuff that Solr does. Solr can deal with an index where some of the segments are in one format (say 1.4) and others are in another (3.6). Maybe they're being stored in a format in the newer (or older) segments that doesn't work with raw retrieval of the values through Lucene in the same way. Maybe it's able to retrieve the stored value from the indexed representation in one case rather than needing to store it. I'd query your index using EmbeddedSolrServer instead and see if that changes what you see. Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions w: appinions.com http://www.appinions.com/ On Fri, Jun 14, 2013 at 4:15 PM, Mingfeng Yang mfy...@wisewindow.com wrote: I have an index first built with solr1.4 and later upgraded to solr3.6, which has 150million documents, and all docs have a datefield which are not blank. (verified by solr query). I am using the following code snippet to retrieve import org.apache.lucene.index.IndexReader; import org.apache.lucene.store.*; import org.apache.lucene.document.*; IndexReader input = IndexReader.open(indexDir); Document d = input.document(i); int maxDoc = input.maxDoc(); for (int i = 0; i maxDoc; i++) { System.out.println(d.get('date'); } However, about 100 million docs give null for d.get('date') and about other 50 million docs give the right values. What could be wrong? Ming-
Re: retrieve datefield value from document
Yes, that should be what happens. But then I'd guess you'd be able to retrieve no dates. I've encountered this myself. On Jun 14, 2013 6:05 PM, Mingfeng Yang mfy...@wisewindow.com wrote: Michael, That's what I thought as well. I would assume an optimization of the index would rewrite all documents in the newer format then? Ming- On Fri, Jun 14, 2013 at 1:25 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Shot in the dark: You're using Lucene to read the index. That's sort of circumventing all the typing stuff that Solr does. Solr can deal with an index where some of the segments are in one format (say 1.4) and others are in another (3.6). Maybe they're being stored in a format in the newer (or older) segments that doesn't work with raw retrieval of the values through Lucene in the same way. Maybe it's able to retrieve the stored value from the indexed representation in one case rather than needing to store it. I'd query your index using EmbeddedSolrServer instead and see if that changes what you see. Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions w: appinions.com http://www.appinions.com/ On Fri, Jun 14, 2013 at 4:15 PM, Mingfeng Yang mfy...@wisewindow.com wrote: I have an index first built with solr1.4 and later upgraded to solr3.6, which has 150million documents, and all docs have a datefield which are not blank. (verified by solr query). I am using the following code snippet to retrieve import org.apache.lucene.index.IndexReader; import org.apache.lucene.store.*; import org.apache.lucene.document.*; IndexReader input = IndexReader.open(indexDir); Document d = input.document(i); int maxDoc = input.maxDoc(); for (int i = 0; i maxDoc; i++) { System.out.println(d.get('date'); } However, about 100 million docs give null for d.get('date') and about other 50 million docs give the right values. What could be wrong? Ming-
Re: retrieve datefield value from document
How did you solve the problem then? MIng On Fri, Jun 14, 2013 at 3:24 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Yes, that should be what happens. But then I'd guess you'd be able to retrieve no dates. I've encountered this myself. On Jun 14, 2013 6:05 PM, Mingfeng Yang mfy...@wisewindow.com wrote: Michael, That's what I thought as well. I would assume an optimization of the index would rewrite all documents in the newer format then? Ming- On Fri, Jun 14, 2013 at 1:25 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Shot in the dark: You're using Lucene to read the index. That's sort of circumventing all the typing stuff that Solr does. Solr can deal with an index where some of the segments are in one format (say 1.4) and others are in another (3.6). Maybe they're being stored in a format in the newer (or older) segments that doesn't work with raw retrieval of the values through Lucene in the same way. Maybe it's able to retrieve the stored value from the indexed representation in one case rather than needing to store it. I'd query your index using EmbeddedSolrServer instead and see if that changes what you see. Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions w: appinions.com http://www.appinions.com/ On Fri, Jun 14, 2013 at 4:15 PM, Mingfeng Yang mfy...@wisewindow.com wrote: I have an index first built with solr1.4 and later upgraded to solr3.6, which has 150million documents, and all docs have a datefield which are not blank. (verified by solr query). I am using the following code snippet to retrieve import org.apache.lucene.index.IndexReader; import org.apache.lucene.store.*; import org.apache.lucene.document.*; IndexReader input = IndexReader.open(indexDir); Document d = input.document(i); int maxDoc = input.maxDoc(); for (int i = 0; i maxDoc; i++) { System.out.println(d.get('date'); } However, about 100 million docs give null for d.get('date') and about other 50 million docs give the right values. What could be wrong? Ming-
Re: retrieve datefield value from document
Use EmbeddedSolrServer rather than Lucene directly. On Jun 14, 2013 6:47 PM, Mingfeng Yang mfy...@wisewindow.com wrote: How did you solve the problem then? MIng On Fri, Jun 14, 2013 at 3:24 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Yes, that should be what happens. But then I'd guess you'd be able to retrieve no dates. I've encountered this myself. On Jun 14, 2013 6:05 PM, Mingfeng Yang mfy...@wisewindow.com wrote: Michael, That's what I thought as well. I would assume an optimization of the index would rewrite all documents in the newer format then? Ming- On Fri, Jun 14, 2013 at 1:25 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Shot in the dark: You're using Lucene to read the index. That's sort of circumventing all the typing stuff that Solr does. Solr can deal with an index where some of the segments are in one format (say 1.4) and others are in another (3.6). Maybe they're being stored in a format in the newer (or older) segments that doesn't work with raw retrieval of the values through Lucene in the same way. Maybe it's able to retrieve the stored value from the indexed representation in one case rather than needing to store it. I'd query your index using EmbeddedSolrServer instead and see if that changes what you see. Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions w: appinions.com http://www.appinions.com/ On Fri, Jun 14, 2013 at 4:15 PM, Mingfeng Yang mfy...@wisewindow.com wrote: I have an index first built with solr1.4 and later upgraded to solr3.6, which has 150million documents, and all docs have a datefield which are not blank. (verified by solr query). I am using the following code snippet to retrieve import org.apache.lucene.index.IndexReader; import org.apache.lucene.store.*; import org.apache.lucene.document.*; IndexReader input = IndexReader.open(indexDir); Document d = input.document(i); int maxDoc = input.maxDoc(); for (int i = 0; i maxDoc; i++) { System.out.println(d.get('date'); } However, about 100 million docs give null for d.get('date') and about other 50 million docs give the right values. What could be wrong? Ming-
Re: retrieve datefield value from document
Figured out the solution. The datefield in those documents were stored as binary, so what I should do is Fieldable df = doc.getFieldable(fname); byte[] ary = df.getBinaryValue(); ByteBuffer bb = ByteBuffer.wrap(ary); long num = bb.getLong(); ate dt = DateTools.stringToDate(DateTools.timeToString(num, DateTools.Resolution.SECOND)); Then you get dt as a string in the right format. Ming- On Fri, Jun 14, 2013 at 4:20 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Use EmbeddedSolrServer rather than Lucene directly. On Jun 14, 2013 6:47 PM, Mingfeng Yang mfy...@wisewindow.com wrote: How did you solve the problem then? MIng On Fri, Jun 14, 2013 at 3:24 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Yes, that should be what happens. But then I'd guess you'd be able to retrieve no dates. I've encountered this myself. On Jun 14, 2013 6:05 PM, Mingfeng Yang mfy...@wisewindow.com wrote: Michael, That's what I thought as well. I would assume an optimization of the index would rewrite all documents in the newer format then? Ming- On Fri, Jun 14, 2013 at 1:25 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Shot in the dark: You're using Lucene to read the index. That's sort of circumventing all the typing stuff that Solr does. Solr can deal with an index where some of the segments are in one format (say 1.4) and others are in another (3.6). Maybe they're being stored in a format in the newer (or older) segments that doesn't work with raw retrieval of the values through Lucene in the same way. Maybe it's able to retrieve the stored value from the indexed representation in one case rather than needing to store it. I'd query your index using EmbeddedSolrServer instead and see if that changes what you see. Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions w: appinions.com http://www.appinions.com/ On Fri, Jun 14, 2013 at 4:15 PM, Mingfeng Yang mfy...@wisewindow.com wrote: I have an index first built with solr1.4 and later upgraded to solr3.6, which has 150million documents, and all docs have a datefield which are not blank. (verified by solr query). I am using the following code snippet to retrieve import org.apache.lucene.index.IndexReader; import org.apache.lucene.store.*; import org.apache.lucene.document.*; IndexReader input = IndexReader.open(indexDir); Document d = input.document(i); int maxDoc = input.maxDoc(); for (int i = 0; i maxDoc; i++) { System.out.println(d.get('date'); } However, about 100 million docs give null for d.get('date') and about other 50 million docs give the right values. What could be wrong? Ming-