retrieve datefield value from document

2013-06-14 Thread Mingfeng Yang
I have an index first built with solr1.4 and later upgraded to solr3.6,
which has 150million documents, and all docs have a datefield which are not
blank. (verified by solr query).

I am using the following code snippet to retrieve

import org.apache.lucene.index.IndexReader;
import org.apache.lucene.store.*;
import org.apache.lucene.document.*;

IndexReader input = IndexReader.open(indexDir);
Document d = input.document(i);
int maxDoc = input.maxDoc();
for (int i = 0; i  maxDoc; i++) {
System.out.println(d.get('date');
}

However, about 100 million docs give null for d.get('date') and about other
50 million docs give the right values.

What could be wrong?

Ming-


Re: retrieve datefield value from document

2013-06-14 Thread Michael Della Bitta
Shot in the dark:

You're using Lucene to read the index. That's sort of circumventing all the
typing stuff that Solr does. Solr can deal with an index where some of the
segments are in one format (say 1.4) and others are in another (3.6). Maybe
they're being stored in a format in the newer (or older) segments that
doesn't work with raw retrieval of the values through Lucene in the same
way.

Maybe it's able to retrieve the stored value from the indexed
representation in one case rather than needing to store it.

I'd query your index using EmbeddedSolrServer instead and see if that
changes what you see.


Michael Della Bitta

Applications Developer

o: +1 646 532 3062  | c: +1 917 477 7906

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions https://twitter.com/Appinions | g+:
plus.google.com/appinions
w: appinions.com http://www.appinions.com/


On Fri, Jun 14, 2013 at 4:15 PM, Mingfeng Yang mfy...@wisewindow.comwrote:

 I have an index first built with solr1.4 and later upgraded to solr3.6,
 which has 150million documents, and all docs have a datefield which are not
 blank. (verified by solr query).

 I am using the following code snippet to retrieve

 import org.apache.lucene.index.IndexReader;
 import org.apache.lucene.store.*;
 import org.apache.lucene.document.*;

 IndexReader input = IndexReader.open(indexDir);
 Document d = input.document(i);
 int maxDoc = input.maxDoc();
 for (int i = 0; i  maxDoc; i++) {
 System.out.println(d.get('date');
 }

 However, about 100 million docs give null for d.get('date') and about other
 50 million docs give the right values.

 What could be wrong?

 Ming-



Re: retrieve datefield value from document

2013-06-14 Thread Dmitry Kan
Maybe a document was marked as deleted?

*isDeletedhttp://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/index/IndexReader.html#isDeleted(int)
*


On Fri, Jun 14, 2013 at 11:25 PM, Michael Della Bitta 
michael.della.bi...@appinions.com wrote:

 Shot in the dark:

 You're using Lucene to read the index. That's sort of circumventing all the
 typing stuff that Solr does. Solr can deal with an index where some of the
 segments are in one format (say 1.4) and others are in another (3.6). Maybe
 they're being stored in a format in the newer (or older) segments that
 doesn't work with raw retrieval of the values through Lucene in the same
 way.

 Maybe it's able to retrieve the stored value from the indexed
 representation in one case rather than needing to store it.

 I'd query your index using EmbeddedSolrServer instead and see if that
 changes what you see.


 Michael Della Bitta

 Applications Developer

 o: +1 646 532 3062  | c: +1 917 477 7906

 appinions inc.

 “The Science of Influence Marketing”

 18 East 41st Street

 New York, NY 10017

 t: @appinions https://twitter.com/Appinions | g+:
 plus.google.com/appinions
 w: appinions.com http://www.appinions.com/


 On Fri, Jun 14, 2013 at 4:15 PM, Mingfeng Yang mfy...@wisewindow.com
 wrote:

  I have an index first built with solr1.4 and later upgraded to solr3.6,
  which has 150million documents, and all docs have a datefield which are
 not
  blank. (verified by solr query).
 
  I am using the following code snippet to retrieve
 
  import org.apache.lucene.index.IndexReader;
  import org.apache.lucene.store.*;
  import org.apache.lucene.document.*;
 
  IndexReader input = IndexReader.open(indexDir);
  Document d = input.document(i);
  int maxDoc = input.maxDoc();
  for (int i = 0; i  maxDoc; i++) {
  System.out.println(d.get('date');
  }
 
  However, about 100 million docs give null for d.get('date') and about
 other
  50 million docs give the right values.
 
  What could be wrong?
 
  Ming-
 



Re: retrieve datefield value from document

2013-06-14 Thread Mingfeng Yang
Michael,

That's what I thought as well.  I would assume an optimization of the index
would rewrite all documents in the newer format then?

Ming-



On Fri, Jun 14, 2013 at 1:25 PM, Michael Della Bitta 
michael.della.bi...@appinions.com wrote:

 Shot in the dark:

 You're using Lucene to read the index. That's sort of circumventing all the
 typing stuff that Solr does. Solr can deal with an index where some of the
 segments are in one format (say 1.4) and others are in another (3.6). Maybe
 they're being stored in a format in the newer (or older) segments that
 doesn't work with raw retrieval of the values through Lucene in the same
 way.

 Maybe it's able to retrieve the stored value from the indexed
 representation in one case rather than needing to store it.

 I'd query your index using EmbeddedSolrServer instead and see if that
 changes what you see.


 Michael Della Bitta

 Applications Developer

 o: +1 646 532 3062  | c: +1 917 477 7906

 appinions inc.

 “The Science of Influence Marketing”

 18 East 41st Street

 New York, NY 10017

 t: @appinions https://twitter.com/Appinions | g+:
 plus.google.com/appinions
 w: appinions.com http://www.appinions.com/


 On Fri, Jun 14, 2013 at 4:15 PM, Mingfeng Yang mfy...@wisewindow.com
 wrote:

  I have an index first built with solr1.4 and later upgraded to solr3.6,
  which has 150million documents, and all docs have a datefield which are
 not
  blank. (verified by solr query).
 
  I am using the following code snippet to retrieve
 
  import org.apache.lucene.index.IndexReader;
  import org.apache.lucene.store.*;
  import org.apache.lucene.document.*;
 
  IndexReader input = IndexReader.open(indexDir);
  Document d = input.document(i);
  int maxDoc = input.maxDoc();
  for (int i = 0; i  maxDoc; i++) {
  System.out.println(d.get('date');
  }
 
  However, about 100 million docs give null for d.get('date') and about
 other
  50 million docs give the right values.
 
  What could be wrong?
 
  Ming-
 



Re: retrieve datefield value from document

2013-06-14 Thread Mingfeng Yang
HI Dmitry,

No, the docs are not deleted.

Ming-


On Fri, Jun 14, 2013 at 1:31 PM, Dmitry Kan solrexp...@gmail.com wrote:

 Maybe a document was marked as deleted?

 *isDeleted
 http://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/index/IndexReader.html#isDeleted(int)
 
 *


 On Fri, Jun 14, 2013 at 11:25 PM, Michael Della Bitta 
 michael.della.bi...@appinions.com wrote:

  Shot in the dark:
 
  You're using Lucene to read the index. That's sort of circumventing all
 the
  typing stuff that Solr does. Solr can deal with an index where some of
 the
  segments are in one format (say 1.4) and others are in another (3.6).
 Maybe
  they're being stored in a format in the newer (or older) segments that
  doesn't work with raw retrieval of the values through Lucene in the same
  way.
 
  Maybe it's able to retrieve the stored value from the indexed
  representation in one case rather than needing to store it.
 
  I'd query your index using EmbeddedSolrServer instead and see if that
  changes what you see.
 
 
  Michael Della Bitta
 
  Applications Developer
 
  o: +1 646 532 3062  | c: +1 917 477 7906
 
  appinions inc.
 
  “The Science of Influence Marketing”
 
  18 East 41st Street
 
  New York, NY 10017
 
  t: @appinions https://twitter.com/Appinions | g+:
  plus.google.com/appinions
  w: appinions.com http://www.appinions.com/
 
 
  On Fri, Jun 14, 2013 at 4:15 PM, Mingfeng Yang mfy...@wisewindow.com
  wrote:
 
   I have an index first built with solr1.4 and later upgraded to solr3.6,
   which has 150million documents, and all docs have a datefield which are
  not
   blank. (verified by solr query).
  
   I am using the following code snippet to retrieve
  
   import org.apache.lucene.index.IndexReader;
   import org.apache.lucene.store.*;
   import org.apache.lucene.document.*;
  
   IndexReader input = IndexReader.open(indexDir);
   Document d = input.document(i);
   int maxDoc = input.maxDoc();
   for (int i = 0; i  maxDoc; i++) {
   System.out.println(d.get('date');
   }
  
   However, about 100 million docs give null for d.get('date') and about
  other
   50 million docs give the right values.
  
   What could be wrong?
  
   Ming-
  
 



Re: retrieve datefield value from document

2013-06-14 Thread Michael Della Bitta
Yes, that should be what happens. But then I'd guess you'd be able to
retrieve no dates. I've encountered this myself.
On Jun 14, 2013 6:05 PM, Mingfeng Yang mfy...@wisewindow.com wrote:

 Michael,

 That's what I thought as well.  I would assume an optimization of the index
 would rewrite all documents in the newer format then?

 Ming-



 On Fri, Jun 14, 2013 at 1:25 PM, Michael Della Bitta 
 michael.della.bi...@appinions.com wrote:

  Shot in the dark:
 
  You're using Lucene to read the index. That's sort of circumventing all
 the
  typing stuff that Solr does. Solr can deal with an index where some of
 the
  segments are in one format (say 1.4) and others are in another (3.6).
 Maybe
  they're being stored in a format in the newer (or older) segments that
  doesn't work with raw retrieval of the values through Lucene in the same
  way.
 
  Maybe it's able to retrieve the stored value from the indexed
  representation in one case rather than needing to store it.
 
  I'd query your index using EmbeddedSolrServer instead and see if that
  changes what you see.
 
 
  Michael Della Bitta
 
  Applications Developer
 
  o: +1 646 532 3062  | c: +1 917 477 7906
 
  appinions inc.
 
  “The Science of Influence Marketing”
 
  18 East 41st Street
 
  New York, NY 10017
 
  t: @appinions https://twitter.com/Appinions | g+:
  plus.google.com/appinions
  w: appinions.com http://www.appinions.com/
 
 
  On Fri, Jun 14, 2013 at 4:15 PM, Mingfeng Yang mfy...@wisewindow.com
  wrote:
 
   I have an index first built with solr1.4 and later upgraded to solr3.6,
   which has 150million documents, and all docs have a datefield which are
  not
   blank. (verified by solr query).
  
   I am using the following code snippet to retrieve
  
   import org.apache.lucene.index.IndexReader;
   import org.apache.lucene.store.*;
   import org.apache.lucene.document.*;
  
   IndexReader input = IndexReader.open(indexDir);
   Document d = input.document(i);
   int maxDoc = input.maxDoc();
   for (int i = 0; i  maxDoc; i++) {
   System.out.println(d.get('date');
   }
  
   However, about 100 million docs give null for d.get('date') and about
  other
   50 million docs give the right values.
  
   What could be wrong?
  
   Ming-
  
 



Re: retrieve datefield value from document

2013-06-14 Thread Mingfeng Yang
How did you solve the problem then?

MIng


On Fri, Jun 14, 2013 at 3:24 PM, Michael Della Bitta 
michael.della.bi...@appinions.com wrote:

 Yes, that should be what happens. But then I'd guess you'd be able to
 retrieve no dates. I've encountered this myself.
 On Jun 14, 2013 6:05 PM, Mingfeng Yang mfy...@wisewindow.com wrote:

  Michael,
 
  That's what I thought as well.  I would assume an optimization of the
 index
  would rewrite all documents in the newer format then?
 
  Ming-
 
 
 
  On Fri, Jun 14, 2013 at 1:25 PM, Michael Della Bitta 
  michael.della.bi...@appinions.com wrote:
 
   Shot in the dark:
  
   You're using Lucene to read the index. That's sort of circumventing all
  the
   typing stuff that Solr does. Solr can deal with an index where some of
  the
   segments are in one format (say 1.4) and others are in another (3.6).
  Maybe
   they're being stored in a format in the newer (or older) segments that
   doesn't work with raw retrieval of the values through Lucene in the
 same
   way.
  
   Maybe it's able to retrieve the stored value from the indexed
   representation in one case rather than needing to store it.
  
   I'd query your index using EmbeddedSolrServer instead and see if that
   changes what you see.
  
  
   Michael Della Bitta
  
   Applications Developer
  
   o: +1 646 532 3062  | c: +1 917 477 7906
  
   appinions inc.
  
   “The Science of Influence Marketing”
  
   18 East 41st Street
  
   New York, NY 10017
  
   t: @appinions https://twitter.com/Appinions | g+:
   plus.google.com/appinions
   w: appinions.com http://www.appinions.com/
  
  
   On Fri, Jun 14, 2013 at 4:15 PM, Mingfeng Yang mfy...@wisewindow.com
   wrote:
  
I have an index first built with solr1.4 and later upgraded to
 solr3.6,
which has 150million documents, and all docs have a datefield which
 are
   not
blank. (verified by solr query).
   
I am using the following code snippet to retrieve
   
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.store.*;
import org.apache.lucene.document.*;
   
IndexReader input = IndexReader.open(indexDir);
Document d = input.document(i);
int maxDoc = input.maxDoc();
for (int i = 0; i  maxDoc; i++) {
System.out.println(d.get('date');
}
   
However, about 100 million docs give null for d.get('date') and about
   other
50 million docs give the right values.
   
What could be wrong?
   
Ming-
   
  
 



Re: retrieve datefield value from document

2013-06-14 Thread Michael Della Bitta
Use EmbeddedSolrServer rather than Lucene directly.
On Jun 14, 2013 6:47 PM, Mingfeng Yang mfy...@wisewindow.com wrote:

 How did you solve the problem then?

 MIng


 On Fri, Jun 14, 2013 at 3:24 PM, Michael Della Bitta 
 michael.della.bi...@appinions.com wrote:

  Yes, that should be what happens. But then I'd guess you'd be able to
  retrieve no dates. I've encountered this myself.
  On Jun 14, 2013 6:05 PM, Mingfeng Yang mfy...@wisewindow.com wrote:
 
   Michael,
  
   That's what I thought as well.  I would assume an optimization of the
  index
   would rewrite all documents in the newer format then?
  
   Ming-
  
  
  
   On Fri, Jun 14, 2013 at 1:25 PM, Michael Della Bitta 
   michael.della.bi...@appinions.com wrote:
  
Shot in the dark:
   
You're using Lucene to read the index. That's sort of circumventing
 all
   the
typing stuff that Solr does. Solr can deal with an index where some
 of
   the
segments are in one format (say 1.4) and others are in another (3.6).
   Maybe
they're being stored in a format in the newer (or older) segments
 that
doesn't work with raw retrieval of the values through Lucene in the
  same
way.
   
Maybe it's able to retrieve the stored value from the indexed
representation in one case rather than needing to store it.
   
I'd query your index using EmbeddedSolrServer instead and see if that
changes what you see.
   
   
Michael Della Bitta
   
Applications Developer
   
o: +1 646 532 3062  | c: +1 917 477 7906
   
appinions inc.
   
“The Science of Influence Marketing”
   
18 East 41st Street
   
New York, NY 10017
   
t: @appinions https://twitter.com/Appinions | g+:
plus.google.com/appinions
w: appinions.com http://www.appinions.com/
   
   
On Fri, Jun 14, 2013 at 4:15 PM, Mingfeng Yang 
 mfy...@wisewindow.com
wrote:
   
 I have an index first built with solr1.4 and later upgraded to
  solr3.6,
 which has 150million documents, and all docs have a datefield which
  are
not
 blank. (verified by solr query).

 I am using the following code snippet to retrieve

 import org.apache.lucene.index.IndexReader;
 import org.apache.lucene.store.*;
 import org.apache.lucene.document.*;

 IndexReader input = IndexReader.open(indexDir);
 Document d = input.document(i);
 int maxDoc = input.maxDoc();
 for (int i = 0; i  maxDoc; i++) {
 System.out.println(d.get('date');
 }

 However, about 100 million docs give null for d.get('date') and
 about
other
 50 million docs give the right values.

 What could be wrong?

 Ming-

   
  
 



Re: retrieve datefield value from document

2013-06-14 Thread Mingfeng Yang
Figured out the solution.

The datefield in those documents were stored as binary, so what I should do
is

Fieldable df = doc.getFieldable(fname);
byte[] ary = df.getBinaryValue();
ByteBuffer bb = ByteBuffer.wrap(ary);
long num = bb.getLong();
ate dt = DateTools.stringToDate(DateTools.timeToString(num,
DateTools.Resolution.SECOND));

Then you get dt as a string in the right format.

Ming-


On Fri, Jun 14, 2013 at 4:20 PM, Michael Della Bitta 
michael.della.bi...@appinions.com wrote:

 Use EmbeddedSolrServer rather than Lucene directly.
 On Jun 14, 2013 6:47 PM, Mingfeng Yang mfy...@wisewindow.com wrote:

  How did you solve the problem then?
 
  MIng
 
 
  On Fri, Jun 14, 2013 at 3:24 PM, Michael Della Bitta 
  michael.della.bi...@appinions.com wrote:
 
   Yes, that should be what happens. But then I'd guess you'd be able to
   retrieve no dates. I've encountered this myself.
   On Jun 14, 2013 6:05 PM, Mingfeng Yang mfy...@wisewindow.com
 wrote:
  
Michael,
   
That's what I thought as well.  I would assume an optimization of the
   index
would rewrite all documents in the newer format then?
   
Ming-
   
   
   
On Fri, Jun 14, 2013 at 1:25 PM, Michael Della Bitta 
michael.della.bi...@appinions.com wrote:
   
 Shot in the dark:

 You're using Lucene to read the index. That's sort of circumventing
  all
the
 typing stuff that Solr does. Solr can deal with an index where some
  of
the
 segments are in one format (say 1.4) and others are in another
 (3.6).
Maybe
 they're being stored in a format in the newer (or older) segments
  that
 doesn't work with raw retrieval of the values through Lucene in the
   same
 way.

 Maybe it's able to retrieve the stored value from the indexed
 representation in one case rather than needing to store it.

 I'd query your index using EmbeddedSolrServer instead and see if
 that
 changes what you see.


 Michael Della Bitta

 Applications Developer

 o: +1 646 532 3062  | c: +1 917 477 7906

 appinions inc.

 “The Science of Influence Marketing”

 18 East 41st Street

 New York, NY 10017

 t: @appinions https://twitter.com/Appinions | g+:
 plus.google.com/appinions
 w: appinions.com http://www.appinions.com/


 On Fri, Jun 14, 2013 at 4:15 PM, Mingfeng Yang 
  mfy...@wisewindow.com
 wrote:

  I have an index first built with solr1.4 and later upgraded to
   solr3.6,
  which has 150million documents, and all docs have a datefield
 which
   are
 not
  blank. (verified by solr query).
 
  I am using the following code snippet to retrieve
 
  import org.apache.lucene.index.IndexReader;
  import org.apache.lucene.store.*;
  import org.apache.lucene.document.*;
 
  IndexReader input = IndexReader.open(indexDir);
  Document d = input.document(i);
  int maxDoc = input.maxDoc();
  for (int i = 0; i  maxDoc; i++) {
  System.out.println(d.get('date');
  }
 
  However, about 100 million docs give null for d.get('date') and
  about
 other
  50 million docs give the right values.
 
  What could be wrong?
 
  Ming-