Re: Can we use Berkley DB java in Solr

Noble Paul നോബിള്‍ नोब्ळ् Thu, 04 Dec 2008 20:50:31 -0800

On Fri, Dec 5, 2008 at 12:57 AM, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> On Thu, Dec 4, 2008 at 11:47 AM, Noble Paul നോബിള്‍ नोब्ळ्
> <[EMAIL PROTECTED]> wrote:
>> I tried that and the solution looked so clumsy .
>> I need to commit the to read anything was making things difficult
>
> In a high update environment, most documents would be exposed to an
> open reader with no need to commit or reopen the index to retrieve the
> stored fields.
> In a way, solving the more realtime update issue removes the necessity
> for this altogether.
>
>> Is Lucene write much faster than DB (embedded) writes?
>
> More to the point, we're already doing the Lucene write (for the most
> part) anyway, and the DB write is overhead to the indexing process.
Considering the fact that the extra Lucene write is over and above the
normal indexing I guess we must compare the cost of indexing of 1
document in luven vs cost of writing one row in a DB.
DB gives me an option of writing to a remote m/c . Thus freeing up my
local disk. Lucene has to write to Local disk


In DB I am writing a byte[] (which is quite compressed) . Lucene may
end up writing more data. So more disk I/O (I am just giving a theory
).
Does lucene allow me to write byte[]. ?

The Lucene API itself is more complex for this kind of operations.
(disclaimer: I do not know a whole lot of it) .

Moreover this is just an UpdateRequestProcessor (No changes to the
core). We can have a Lucene based one also.

Most of the users would not use this feature (the perf sensistive
users).The ones who do random updates will not notice it.
The only problem is for users who index heavily and still want to enable this.


>
> -Yonik
>
>> On Thu, Dec 4, 2008 at 10:07 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote:
>>> A database, just to store uncommitted documents in case they might be
>>> updated, seems like it will have a pretty major impact on indexing
>>> performance.  A lucene-only implementation would seem to be much
>>> lighter on resources.
>>>
>>> -Yonik
>>>
>>> On Thu, Dec 4, 2008 at 11:32 AM, Noble Paul നോബിള്‍ नोब्ळ्
>>> <[EMAIL PROTECTED]> wrote:
>>>> The solution will be an UpdateRequestProcessor (which itself is
>>>> pluggable).I am implementing a JDBC based one. I'll test with H2 and
>>>> MySql (and may be Derby)
>>>>
>>>> We will ship the H2 (embedded) jar
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Dec 4, 2008 at 9:53 PM, Ryan McKinley <[EMAIL PROTECTED]> wrote:
>>>>> Again, I would hope that solr builds a storage agnostic solution.
>>>>>
>>>>> As long as we have a simple interface to load/store documents, it should 
>>>>> be
>>>>> easy to write a JDBC/ehcache/disk/Cassandra/whatever implementation.
>>>>>
>>>>> ryan
>>>>>
>>>>>
>>>>> On Dec 4, 2008, at 10:29 AM, Noble Paul നോബിള്‍ नोब्ळ् wrote:
>>>>>
>>>>>> Cassandra does not meet our requirements.
>>>>>> we do not need that kind of scalability
>>>>>>
>>>>>> Moreover its future is uncertain and they are trying to incubate it into
>>>>>> Solr
>>>>>>
>>>>>>
>>>>>> On Thu, Dec 4, 2008 at 8:52 PM, Sami Siren <[EMAIL PROTECTED]> wrote:
>>>>>>>
>>>>>>> Yet another possibility: http://wiki.apache.org/incubator/Cassandra
>>>>>>>
>>>>>>> It at least claims to be scalable, no personal experience.
>>>>>>>
>>>>>>> --
>>>>>>> Sami Siren
>>>>>>>
>>>>>>> Noble Paul ??????? ?????? wrote:
>>>>>>>>
>>>>>>>> Another persistence solution is ehcache with diskstore. It even has
>>>>>>>> replication
>>>>>>>>
>>>>>>>> I have never used  ehcache . So I cannot comment on it
>>>>>>>>
>>>>>>>> any comments?
>>>>>>>>
>>>>>>>> --Noble
>>>>>>>>
>>>>>>>> On Wed, Dec 3, 2008 at 8:50 PM, Noble Paul ??????? ??????
>>>>>>>> <[EMAIL PROTECTED]> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Dec 3, 2008 at 5:52 PM, Grant Ingersoll <[EMAIL PROTECTED]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Dec 3, 2008, at 1:28 AM, Noble Paul ??????? ?????? wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> The code can be written against JDBC. But we need to test the DDL 
>>>>>>>>>>> and
>>>>>>>>>>> data types on al the supported DBs
>>>>>>>>>>>
>>>>>>>>>>> But , which one would we like to ship with Solr as a default option?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Why do we need a default option?  Is this something that is intended
>>>>>>>>>> to
>>>>>>>>>> be
>>>>>>>>>> on by default?  Or, do you mean just to have one for unit tests to
>>>>>>>>>> work?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Default does not mean that it is enabled bby default. But if it is
>>>>>>>>> enabled I can have defaults for stuff like driver, url , DDL etc. And
>>>>>>>>> the user may not need to provide an extra jar
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I don't know if it is still the case, but I often find embedded dbs 
>>>>>>>>>> to
>>>>>>>>>> be
>>>>>>>>>> quite annoying since you often can't connect to them from other
>>>>>>>>>> clients
>>>>>>>>>> outside of the JVM which makes debugging harder.  Of course, maybe I
>>>>>>>>>> just
>>>>>>>>>> don't know the tricks to do it.  Derby is one DB that you can still
>>>>>>>>>> connect
>>>>>>>>>> to even when it is embedded.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Embedded is the best bet for us because of performance reasons and
>>>>>>>>> zero management.
>>>>>>>>> The users can still read the data through Solr itself .
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Also, whatever is chosen needs to scale to millions of documents, and
>>>>>>>>>> I
>>>>>>>>>> wonder about an embedded DB doing that.  I also have a hard time
>>>>>>>>>> believing
>>>>>>>>>> that both a DB w/ millions of docs and Solr can live on the same
>>>>>>>>>> machine,
>>>>>>>>>> which is presumably what an embedded DB must do.  Presumably, it also
>>>>>>>>>> needs
>>>>>>>>>> to be able to be replicated, right?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> millions of docs.?
>>>>>>>>> then you must configure a remote DB for storage reasons
>>>>>>>>> and must manage the replication separately
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> H2 looks impressive. the jar (small)  is just 667KB and the memory
>>>>>>>>>>> footprint is small too
>>>>>>>>>>> --Noble
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Dec 3, 2008 at 10:30 AM, Ryan McKinley <[EMAIL PROTECTED]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> check http://www.h2database.com/  in my view the best embedded DB
>>>>>>>>>>>> out
>>>>>>>>>>>> there.
>>>>>>>>>>>>
>>>>>>>>>>>> from the maker of HSQLDB...  is second round.
>>>>>>>>>>>>
>>>>>>>>>>>> However, from anything solr, I would hope it would just rely on
>>>>>>>>>>>> JDBC.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Dec 2, 2008, at 12:08 PM, Shalin Shekhar Mangar wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> HSQLDB has a limit of upto 8GB of data. In Solr, you might want to
>>>>>>>>>>>>> go
>>>>>>>>>>>>> beyond
>>>>>>>>>>>>> that without a commit.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Dec 2, 2008 at 10:33 PM, Dawid Weiss
>>>>>>>>>>>>> <[EMAIL PROTECTED]>wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Isn't HSQLDB an option? Its performance ranges a lot depending on
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> volume of data and queries, but otherwise the license looks
>>>>>>>>>>>>>> BSDish.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> http://hsqldb.org/web/hsqlLicense.html
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Dawid
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Shalin Shekhar Mangar.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> --Noble Paul
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --------------------------
>>>>>>>>>> Grant Ingersoll
>>>>>>>>>>
>>>>>>>>>> Lucene Helpful Hints:
>>>>>>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>>>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> --Noble Paul
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> --Noble Paul
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> --Noble Paul
>>>>
>>>
>>
>>
>>
>> --
>> --Noble Paul
>>
>



-- 
--Noble Paul

Re: Can we use Berkley DB java in Solr

Reply via email to