Re: how can i use solrj binary format for indexing?

2010-10-21 Thread Jason, Kim

Hi Gora, I really appreciate.
Your reply was a great help to me. :)
I hope everything is fine with you.

Regards,
Jason




Gora Mohanty-3 wrote:
 
 On Mon, Oct 18, 2010 at 8:22 PM, Jason, Kim hialo...@gmail.com wrote:
 
 Sorry for the delay in replying. Was caught up in various things this
 week.
 
 Thank you for reply, Gora

 But I still have several questions.
 Did you use separate index?
 If so, you indexed 0.7 million Xml files per instance
 and merged it. Is it Right?
 
 Yes, that is correct. We sharded the data by user ID, so that each of the
 25
 cores held approximately 0.7 million out of the 3.5 million records. We
 could
 have used the sharded indices directly for search, but at least for now
 have
 decided to go with a single, merged index.
 
 Please let me know how to work multiple instances and cores in your case.
 [...]
 
 * Multi-core Solr setup is quite easy, via configuration in solr.xml:
   http://wiki.apache.org/solr/CoreAdmin . The configuration, i.e.,
   schema, solrconfig.xml, etc. need to be replicated across the
   cores.
 * Decide which XML files you will post to which core, and do the
   POST with curl, as usual. You might need to write a little script
   to do this.
 * After indexing on the cores is done, make sure to do a commit
   on each.
 * Merge the sharded indexes (if desired) as described here:
   http://wiki.apache.org/solr/MergingSolrIndexes . One thing to
   watch out for here is disk space. When merging with Lucene
   IndexMergeTool, we found that a rough rule of thumb was that
   intermediate steps in the merge would require about twice as
   much space as the total size of the indexes to be merged. I.e.,
   if one is merging 40GB of data in sharded indexes, one should
   have at least 120GB free.
 
 Regards,
 Gora
 
 

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/how-can-i-use-solrj-binary-format-for-indexing-tp1722612p1750669.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how can i use solrj binary format for indexing?

2010-10-20 Thread Gora Mohanty
On Mon, Oct 18, 2010 at 8:22 PM, Jason, Kim hialo...@gmail.com wrote:

Sorry for the delay in replying. Was caught up in various things this
week.

 Thank you for reply, Gora

 But I still have several questions.
 Did you use separate index?
 If so, you indexed 0.7 million Xml files per instance
 and merged it. Is it Right?

Yes, that is correct. We sharded the data by user ID, so that each of the 25
cores held approximately 0.7 million out of the 3.5 million records. We could
have used the sharded indices directly for search, but at least for now have
decided to go with a single, merged index.

 Please let me know how to work multiple instances and cores in your case.
[...]

* Multi-core Solr setup is quite easy, via configuration in solr.xml:
  http://wiki.apache.org/solr/CoreAdmin . The configuration, i.e.,
  schema, solrconfig.xml, etc. need to be replicated across the
  cores.
* Decide which XML files you will post to which core, and do the
  POST with curl, as usual. You might need to write a little script
  to do this.
* After indexing on the cores is done, make sure to do a commit
  on each.
* Merge the sharded indexes (if desired) as described here:
  http://wiki.apache.org/solr/MergingSolrIndexes . One thing to
  watch out for here is disk space. When merging with Lucene
  IndexMergeTool, we found that a rough rule of thumb was that
  intermediate steps in the merge would require about twice as
  much space as the total size of the indexes to be merged. I.e.,
  if one is merging 40GB of data in sharded indexes, one should
  have at least 120GB free.

Regards,
Gora


Re: how can i use solrj binary format for indexing?

2010-10-18 Thread Peter Karich
Hi,

you can try to parse the xml via Java yourself and then push the
SolrInputDocuments it via SolrJ to solr.
setting format to binaray + using the streaming update processor should
improve performance,
but I am not sure... and performant (+less mem!) reading xml in Java is
another topic ... ;-)

Regards,
Peter.

 Hi all
 I have a huge amount of xml files for indexing.
 I want to index using solrj binary format to get performance gain.
 Because I heard that using xml files to index is quite slow.
 But I don't know how to use index through solrj binary format and can't find
 examples.
 Please give some help.
 Thanks,
   


-- 
http://jetwick.com twitter search prototype



Re: how can i use solrj binary format for indexing?

2010-10-18 Thread Jason, Kim

Hi, Gora
I haven't tried yet indexing huge amount of xml files through curl or pure
java(like a post.jar).
Indexing through xml is really fast?
How many files did you index? And How did it(using curl or pure java)?

Thanks, Gora
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/how-can-i-use-solrj-binary-format-for-indexing-tp1722612p1724645.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how can i use solrj binary format for indexing?

2010-10-18 Thread Gora Mohanty
On Mon, Oct 18, 2010 at 5:26 PM, Jason, Kim hialo...@gmail.com wrote:

 Hi, Gora
 I haven't tried yet indexing huge amount of xml files through curl or pure
 java(like a post.jar).
 Indexing through xml is really fast?
 How many files did you index? And How did it(using curl or pure java)?
[...]

We did it through curl. There were some 3.5 million XML files, and some
60 fields in the Solr schema, with minor tokenising, though with some
facets. A total of about 40GB of data. We used five Solr instances, and
five cores on each instance. From what I recall, it took 6h, though here
we might have well been limited by the read speed on a slow network
drive that held the data. If done in this way, one might need to merge the
data from the various cores, a task which took us about 1.5h.

Regards,
Gora


Re: how can i use solrj binary format for indexing?

2010-10-18 Thread Ryan McKinley
Do you already have the files as solr XML?  If so, I don't think you need solrj

If you need to build SolrInputDocuments from your existing structure,
solrj is a good choice.  If you are indexing lots of stuff, check the
StreamingUpdateSolrServer:
http://lucene.apache.org/solr/api/solrj/org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer.html


On Sun, Oct 17, 2010 at 11:01 PM, Jason, Kim hialo...@gmail.com wrote:

 Hi all
 I have a huge amount of xml files for indexing.
 I want to index using solrj binary format to get performance gain.
 Because I heard that using xml files to index is quite slow.
 But I don't know how to use index through solrj binary format and can't find
 examples.
 Please give some help.
 Thanks,
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/how-can-i-use-solrj-binary-format-for-indexing-tp1722612p1722612.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: how can i use solrj binary format for indexing?

2010-10-18 Thread Jason, Kim

Thank you for reply, Gora

But I still have several questions.
Did you use separate index?
If so, you indexed 0.7 million Xml files per instance
and merged it. Is it Right?
Please let me know how to work multiple instances and cores in your case.

Regards,
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/how-can-i-use-solrj-binary-format-for-indexing-tp1722612p1725679.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: how can i use solrj binary format for indexing?

2010-10-18 Thread Sharp, Jonathan
Hi all
I have a huge amount of xml files for indexing.
I want to index using solrj binary format to get performance gain.
Because I heard that using xml files to index is quite slow.
But I don't know how to use index through solrj binary format and can't find 
examples.
Please give some help.
Thanks,

You might want to take a look at this section of the wiki too --
http://wiki.apache.org/solr/Solrj#Setting_the_RequestWriter

-Jon

-Original Message-
From: Jason, Kim [mailto:hialo...@gmail.com] 
Sent: Monday, October 18, 2010 7:52 AM
To: solr-user@lucene.apache.org
Subject: Re: how can i use solrj binary format for indexing?


Thank you for reply, Gora

But I still have several questions.
Did you use separate index?
If so, you indexed 0.7 million Xml files per instance
and merged it. Is it Right?
Please let me know how to work multiple instances and cores in your case.

Regards,
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/how-can-i-use-solrj-binary-format-for-indexing-tp1722612p1725679.html
Sent from the Solr - User mailing list archive at Nabble.com.


-
SECURITY/CONFIDENTIALITY WARNING:  
This message and any attachments are intended solely for the individual or 
entity to which they are addressed. This communication may contain information 
that is privileged, confidential, or exempt from disclosure under applicable 
law (e.g., personal health information, research data, financial information). 
Because this e-mail has been sent without encryption, individuals other than 
the intended recipient may be able to view the information, forward it to 
others or tamper with the information without the knowledge or consent of the 
sender. If you are not the intended recipient, or the employee or person 
responsible for delivering the message to the intended recipient, any 
dissemination, distribution or copying of the communication is strictly 
prohibited. If you received the communication in error, please notify the 
sender immediately by replying to this message and deleting the message and any 
accompanying files from your system. If, due to the security risks, you do not 
wish to receive further communications via e-mail, please reply to this message 
and inform the sender that you do not wish to receive further e-mail from the 
sender. 

-



how can i use solrj binary format for indexing?

2010-10-17 Thread Jason, Kim

Hi all
I have a huge amount of xml files for indexing.
I want to index using solrj binary format to get performance gain.
Because I heard that using xml files to index is quite slow.
But I don't know how to use index through solrj binary format and can't find
examples.
Please give some help.
Thanks,
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/how-can-i-use-solrj-binary-format-for-indexing-tp1722612p1722612.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how can i use solrj binary format for indexing?

2010-10-17 Thread Gora Mohanty
On Mon, Oct 18, 2010 at 8:31 AM, Jason, Kim hialo...@gmail.com wrote:

 Hi all
 I have a huge amount of xml files for indexing.
 I want to index using solrj binary format to get performance gain.
 Because I heard that using xml files to index is quite slow.
[...]

Do not know about SolrJ's binary format, but indexing through XML
is quite fast in our experience. Have you tried it out to see if it meets
your requirements?

Regards,
Gora