Hi jeroen and others who replied to my mail...  Let me further explain
my usecase and existing infrastructure.

My customer stores their product data in xml-files on file system 

E.g. 
  ${repofolder}/
        products/
                product-1/      
                        product-1.xml
                        product-1-image.jpg
                        ...
                product-2/      
                        product-2.xml
                        product-2-image.jpg
                ...

This is a simplified representation but as you see there is no concept
of an xml database.

Now let's start with a small fictive example for product-1.xml: 

<product>
  <id>xxxx</id>
  <description>grandma's cookies</description>
  <category>food</category>
  <price>2.0</price>
</product>

>From a functional point of view they want to be able to search for
products based on some criteria.  So I'll have to build a small
searchform containing:
        - Dropdown with all possible categories
        - textbox to search for part of description
        - price "between/ equal to / greather then / less then" search
functionality

So for certain "Filter"-criteria I'll have to get all possible values so
they can pick one and for others I don't need to know anything about the
actual data.

The actual product xml-files are +- 500kb on average and I'm talking
about LOTS of products so I have to consider performance upfront.

SOLR seems good for indexing static html files etc but I don't get the
impression it can offer the necessary functionality for this use case.

Any comments??

Cheers,
Robby





-----Original Message-----
From: Jeroen Reijn [mailto:j.re...@onehippo.com] 
Sent: Tuesday, September 08, 2009 9:01 AM
To: users@cocoon.apache.org
Subject: Re: how-to query an xml repository efficiently

Hi Robby,

do you perhaps have any more specs on what kind of XML database it is?

At our company we have experience with an Apache Slide backed database, 
which we used for storing XML files and let Slide indexed them with 
Lucene. Then based on DASL queries we could search the repository really

quickly.

Next to DASK I know there are also XML databases that can use XQueries 
to perform fast searches on their XML database.

Regards,

Jeroen

Robby Pelssers wrote:
> Hi all,
> 
>  
> 
> I have following use case.  The customer has an xml repository which
is 
> nothing more then a directory on filesystem which contains 
> subdirectories containing one or more xml files.  They now want to
query 
> those xml files on some predefined criteria which might change over
time...
> 
>  
> 
> I'm looking for a solution which results in high performance search
and 
> some things that came to my mind was
> 
> *         extracting information and storing them in a database (e.g. 
> HSQLDB) 
> 
> *         using lucene
> 
>  
> 
> Is there somewhere detailed documentation available on using these?
And 
> what would you recommend for my use case?
> 
>  
> 
> I already found some stuff but no real quick-start material.
> 
> http://cocoon.apache.org/2.1/userdocs/concepts/xmlsearching.html
> 
> http://cocoon.apache.org/2.2/blocks/hsqldb-client/1.0/
> 
> http://cocoon.apache.org/2.2/blocks/hsqldb-server/1.0/
> 
>  
> 
> Thx in advance,
> 
> Robby Pelssers
> 
>  
> 
>  
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org
For additional commands, e-mail: users-h...@cocoon.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org
For additional commands, e-mail: users-h...@cocoon.apache.org

Reply via email to