Re: integrating Accumulo with solr

2014-07-31 Thread Jack Krupansky
To be clear, I wasn't suggesting that Accumulo was the cause of integration 
complexity - EVERY NoSQL will have integration complexity of comparable 
magnitude. The advantage of DataStax Enterprise or Sqrrl Enterprise is that 
they have done the integration work for you.


-- Jack Krupansky

-Original Message- 
From: Ali Nazemian

Sent: Wednesday, July 30, 2014 2:53 AM
To: solr-user@lucene.apache.org
Subject: Re: integrating Accumulo with solr

Sure,
Thank you very much for your guide. I think I am not that kind of gunslinger
and probably I will go for another NoSQL that can be integrated with
solr/elastic search much easier:)
Best regards.


On Sun, Jul 27, 2014 at 5:02 PM, Jack Krupansky j...@basetechnology.com
wrote:


Right, and that's exactly what DataStax Enterprise provides (at great
engineering effort!) - synchronization of database updates and search
indexing. Sure, you can do it as well, but that's a significant 
engineering
challenge with both sides of the equation, and not a simple plug and 
play

configuration setting by writing a simple connector.

But, hey, if you consider yourself one of those true hard-core
gunslingers then you'll be able to code that up in a weekend without any
of our assistance, right?

In short, synchronizing two data stores is a real challenge. Yes, it is
doable, but... it is non-trivial. Especially if both stores are 
distributed

clusters. Maybe now you can guess why the Sqrrl guys went the Lucene route
instead of Solr.

I'm certainly not suggesting that it can't be done. Just highlighting the
challenge of such a task.

Just to be clear, you are referring to sync mode and not mere ETL,
which people do all the time with batch scripts, Java extraction and
ingestion connectors, and cron jobs.

Give it a shot and let us know how it works out.


-- Jack Krupansky

-Original Message- From: Ali Nazemian
Sent: Sunday, July 27, 2014 1:20 AM

To: solr-user@lucene.apache.org
Subject: Re: integrating Accumulo with solr

Dear Jack,
Hi,
One more thing to mention: I dont want to use solr or lucence for indexing
accumulo or full text search inside that. I am looking for have both in a
sync mode. I mean import some parts of data to solr for indexing. For this
purpose probably I need something like trigger in RDBMS, I have to define
something (probably with accumulo iterator) to import to solr on inserting
new data.
Regards.

On Fri, Jul 25, 2014 at 12:59 PM, Ali Nazemian alinazem...@gmail.com
wrote:

 Dear Jack,

Actually I am going to do benefit-cost analysis for in-house developement
or going for sqrrl support.
Best regards.


On Thu, Jul 24, 2014 at 11:48 PM, Jack Krupansky j...@basetechnology.com

wrote:

 Like I said, you're going to have to be a real, hard-core gunslinger to

do that well. Sqrrl uses Lucene directly, BTW:

Full-Text Search: Utilizing open-source Lucene and custom indexing
methods, Sqrrl Enterprise users can conduct real-time, full-text search
across data in Sqrrl Enterprise.

See:
http://sqrrl.com/product/search/

Out of curiosity, why are you not using that integrated Lucene support 
of

Sqrrl Enterprise?


-- Jack Krupansky

-Original Message- From: Ali Nazemian
Sent: Thursday, July 24, 2014 3:07 PM

To: solr-user@lucene.apache.org
Subject: Re: integrating Accumulo with solr

Dear Jack,
Thank you. I am aware of datastax but I am looking for integrating
accumulo
with solr. This is something like what sqrrl guys offer.
Regards.


On Thu, Jul 24, 2014 at 7:27 PM, Jack Krupansky j...@basetechnology.com

wrote:

 If you are not a true hard-core gunslinger who is willing to dive in


and
integrate the code yourself, instead you should give serious
consideration
to a product such as DataStax Enterprise that fully integrates and
packages
a NoSQL database (Cassandra) and Solr for search. The security aspects
are
still a work in progress, but certainly headed in the right direction.
And
it has Hadoop and Spark integration as well.

See:
http://www.datastax.com/what-we-offer/products-services/
datastax-enterprise

-- Jack Krupansky

-Original Message- From: Ali Nazemian
Sent: Thursday, July 24, 2014 10:30 AM
To: solr-user@lucene.apache.org
Subject: Re: integrating Accumulo with solr


Thank you very much. Nice Idea but how can Solr and Accumulo can be
synchronized in this way?
I know that Solr can be integrated with HDFS and also Accumulo works on
the
top of HDFS. So can I use HDFS as integration point? I mean set Solr to
use
HDFS as a source of documents as well as the destination of documents.
Regards.


On Thu, Jul 24, 2014 at 4:33 PM, Joe Gresock jgres...@gmail.com
wrote:

 Ali,



Sounds like a good choice.  It's pretty standard to store the primary
storage id as a field in Solr so that you can search the full text in
Solr
and then retrieve the full document elsewhere.

I would recommend creating a document structure in Solr with whatever
fields you want indexed (most likely as text_en, etc.), and then store
a
string field named

Re: integrating Accumulo with solr

2014-07-30 Thread Ali Nazemian
Sure,
Thank you very much for your guide. I think I am not that kind of gunslinger
and probably I will go for another NoSQL that can be integrated with
solr/elastic search much easier:)
Best regards.


On Sun, Jul 27, 2014 at 5:02 PM, Jack Krupansky j...@basetechnology.com
wrote:

 Right, and that's exactly what DataStax Enterprise provides (at great
 engineering effort!) - synchronization of database updates and search
 indexing. Sure, you can do it as well, but that's a significant engineering
 challenge with both sides of the equation, and not a simple plug and play
 configuration setting by writing a simple connector.

 But, hey, if you consider yourself one of those true hard-core
 gunslingers then you'll be able to code that up in a weekend without any
 of our assistance, right?

 In short, synchronizing two data stores is a real challenge. Yes, it is
 doable, but... it is non-trivial. Especially if both stores are distributed
 clusters. Maybe now you can guess why the Sqrrl guys went the Lucene route
 instead of Solr.

 I'm certainly not suggesting that it can't be done. Just highlighting the
 challenge of such a task.

 Just to be clear, you are referring to sync mode and not mere ETL,
 which people do all the time with batch scripts, Java extraction and
 ingestion connectors, and cron jobs.

 Give it a shot and let us know how it works out.


 -- Jack Krupansky

 -Original Message- From: Ali Nazemian
 Sent: Sunday, July 27, 2014 1:20 AM

 To: solr-user@lucene.apache.org
 Subject: Re: integrating Accumulo with solr

 Dear Jack,
 Hi,
 One more thing to mention: I dont want to use solr or lucence for indexing
 accumulo or full text search inside that. I am looking for have both in a
 sync mode. I mean import some parts of data to solr for indexing. For this
 purpose probably I need something like trigger in RDBMS, I have to define
 something (probably with accumulo iterator) to import to solr on inserting
 new data.
 Regards.

 On Fri, Jul 25, 2014 at 12:59 PM, Ali Nazemian alinazem...@gmail.com
 wrote:

  Dear Jack,
 Actually I am going to do benefit-cost analysis for in-house developement
 or going for sqrrl support.
 Best regards.


 On Thu, Jul 24, 2014 at 11:48 PM, Jack Krupansky j...@basetechnology.com
 
 wrote:

  Like I said, you're going to have to be a real, hard-core gunslinger to
 do that well. Sqrrl uses Lucene directly, BTW:

 Full-Text Search: Utilizing open-source Lucene and custom indexing
 methods, Sqrrl Enterprise users can conduct real-time, full-text search
 across data in Sqrrl Enterprise.

 See:
 http://sqrrl.com/product/search/

 Out of curiosity, why are you not using that integrated Lucene support of
 Sqrrl Enterprise?


 -- Jack Krupansky

 -Original Message- From: Ali Nazemian
 Sent: Thursday, July 24, 2014 3:07 PM

 To: solr-user@lucene.apache.org
 Subject: Re: integrating Accumulo with solr

 Dear Jack,
 Thank you. I am aware of datastax but I am looking for integrating
 accumulo
 with solr. This is something like what sqrrl guys offer.
 Regards.


 On Thu, Jul 24, 2014 at 7:27 PM, Jack Krupansky j...@basetechnology.com
 
 wrote:

  If you are not a true hard-core gunslinger who is willing to dive in

 and
 integrate the code yourself, instead you should give serious
 consideration
 to a product such as DataStax Enterprise that fully integrates and
 packages
 a NoSQL database (Cassandra) and Solr for search. The security aspects
 are
 still a work in progress, but certainly headed in the right direction.
 And
 it has Hadoop and Spark integration as well.

 See:
 http://www.datastax.com/what-we-offer/products-services/
 datastax-enterprise

 -- Jack Krupansky

 -Original Message- From: Ali Nazemian
 Sent: Thursday, July 24, 2014 10:30 AM
 To: solr-user@lucene.apache.org
 Subject: Re: integrating Accumulo with solr


 Thank you very much. Nice Idea but how can Solr and Accumulo can be
 synchronized in this way?
 I know that Solr can be integrated with HDFS and also Accumulo works on
 the
 top of HDFS. So can I use HDFS as integration point? I mean set Solr to
 use
 HDFS as a source of documents as well as the destination of documents.
 Regards.


 On Thu, Jul 24, 2014 at 4:33 PM, Joe Gresock jgres...@gmail.com
 wrote:

  Ali,


 Sounds like a good choice.  It's pretty standard to store the primary
 storage id as a field in Solr so that you can search the full text in
 Solr
 and then retrieve the full document elsewhere.

 I would recommend creating a document structure in Solr with whatever
 fields you want indexed (most likely as text_en, etc.), and then store
 a
 string field named content_id, which would be the Accumulo row id
 that
 you look up with a scan.

 One caveat -- Accumulo will be protected at the cell level, but if you
 need
 your Solr search results to be protected by complex authorization
 strings
 similar to Accumulo, you will need to write your own QParserPlugin and
 use
 post filtering:
 http://java.dzone.com/articles/custom

Re: integrating Accumulo with solr

2014-07-27 Thread Jack Krupansky
Right, and that's exactly what DataStax Enterprise provides (at great 
engineering effort!) - synchronization of database updates and search 
indexing. Sure, you can do it as well, but that's a significant engineering 
challenge with both sides of the equation, and not a simple plug and play 
configuration setting by writing a simple connector.


But, hey, if you consider yourself one of those true hard-core gunslingers 
then you'll be able to code that up in a weekend without any of our 
assistance, right?


In short, synchronizing two data stores is a real challenge. Yes, it is 
doable, but... it is non-trivial. Especially if both stores are distributed 
clusters. Maybe now you can guess why the Sqrrl guys went the Lucene route 
instead of Solr.


I'm certainly not suggesting that it can't be done. Just highlighting the 
challenge of such a task.


Just to be clear, you are referring to sync mode and not mere ETL, which 
people do all the time with batch scripts, Java extraction and ingestion 
connectors, and cron jobs.


Give it a shot and let us know how it works out.

-- Jack Krupansky

-Original Message- 
From: Ali Nazemian

Sent: Sunday, July 27, 2014 1:20 AM
To: solr-user@lucene.apache.org
Subject: Re: integrating Accumulo with solr

Dear Jack,
Hi,
One more thing to mention: I dont want to use solr or lucence for indexing
accumulo or full text search inside that. I am looking for have both in a
sync mode. I mean import some parts of data to solr for indexing. For this
purpose probably I need something like trigger in RDBMS, I have to define
something (probably with accumulo iterator) to import to solr on inserting
new data.
Regards.

On Fri, Jul 25, 2014 at 12:59 PM, Ali Nazemian alinazem...@gmail.com
wrote:


Dear Jack,
Actually I am going to do benefit-cost analysis for in-house developement
or going for sqrrl support.
Best regards.


On Thu, Jul 24, 2014 at 11:48 PM, Jack Krupansky j...@basetechnology.com
wrote:


Like I said, you're going to have to be a real, hard-core gunslinger to
do that well. Sqrrl uses Lucene directly, BTW:

Full-Text Search: Utilizing open-source Lucene and custom indexing
methods, Sqrrl Enterprise users can conduct real-time, full-text search
across data in Sqrrl Enterprise.

See:
http://sqrrl.com/product/search/

Out of curiosity, why are you not using that integrated Lucene support of
Sqrrl Enterprise?


-- Jack Krupansky

-Original Message- From: Ali Nazemian
Sent: Thursday, July 24, 2014 3:07 PM

To: solr-user@lucene.apache.org
Subject: Re: integrating Accumulo with solr

Dear Jack,
Thank you. I am aware of datastax but I am looking for integrating
accumulo
with solr. This is something like what sqrrl guys offer.
Regards.


On Thu, Jul 24, 2014 at 7:27 PM, Jack Krupansky j...@basetechnology.com
wrote:

 If you are not a true hard-core gunslinger who is willing to dive in

and
integrate the code yourself, instead you should give serious
consideration
to a product such as DataStax Enterprise that fully integrates and
packages
a NoSQL database (Cassandra) and Solr for search. The security aspects
are
still a work in progress, but certainly headed in the right direction.
And
it has Hadoop and Spark integration as well.

See:
http://www.datastax.com/what-we-offer/products-services/
datastax-enterprise

-- Jack Krupansky

-Original Message- From: Ali Nazemian
Sent: Thursday, July 24, 2014 10:30 AM
To: solr-user@lucene.apache.org
Subject: Re: integrating Accumulo with solr


Thank you very much. Nice Idea but how can Solr and Accumulo can be
synchronized in this way?
I know that Solr can be integrated with HDFS and also Accumulo works on
the
top of HDFS. So can I use HDFS as integration point? I mean set Solr to
use
HDFS as a source of documents as well as the destination of documents.
Regards.


On Thu, Jul 24, 2014 at 4:33 PM, Joe Gresock jgres...@gmail.com wrote:

 Ali,



Sounds like a good choice.  It's pretty standard to store the primary
storage id as a field in Solr so that you can search the full text in
Solr
and then retrieve the full document elsewhere.

I would recommend creating a document structure in Solr with whatever
fields you want indexed (most likely as text_en, etc.), and then store 
a

string field named content_id, which would be the Accumulo row id
that
you look up with a scan.

One caveat -- Accumulo will be protected at the cell level, but if you
need
your Solr search results to be protected by complex authorization
strings
similar to Accumulo, you will need to write your own QParserPlugin and
use
post filtering:
http://java.dzone.com/articles/custom-security-filtering-solr

The code you see in that article is written for an earlier version of
Solr,
but it's not too difficult to adjust it for the latest (we've done so 
in

our project).  Once you've implemented this, you would store an
authorizations string field in each Solr document, and pass in the
authorizations that the user has access to in the fq parameter of every

Re: integrating Accumulo with solr

2014-07-26 Thread Ali Nazemian
Dear Jack,
Hi,
One more thing to mention: I dont want to use solr or lucence for indexing
accumulo or full text search inside that. I am looking for have both in a
sync mode. I mean import some parts of data to solr for indexing. For this
purpose probably I need something like trigger in RDBMS, I have to define
something (probably with accumulo iterator) to import to solr on inserting
new data.
Regards.

On Fri, Jul 25, 2014 at 12:59 PM, Ali Nazemian alinazem...@gmail.com
wrote:

 Dear Jack,
 Actually I am going to do benefit-cost analysis for in-house developement
 or going for sqrrl support.
 Best regards.


 On Thu, Jul 24, 2014 at 11:48 PM, Jack Krupansky j...@basetechnology.com
 wrote:

 Like I said, you're going to have to be a real, hard-core gunslinger to
 do that well. Sqrrl uses Lucene directly, BTW:

 Full-Text Search: Utilizing open-source Lucene and custom indexing
 methods, Sqrrl Enterprise users can conduct real-time, full-text search
 across data in Sqrrl Enterprise.

 See:
 http://sqrrl.com/product/search/

 Out of curiosity, why are you not using that integrated Lucene support of
 Sqrrl Enterprise?


 -- Jack Krupansky

 -Original Message- From: Ali Nazemian
 Sent: Thursday, July 24, 2014 3:07 PM

 To: solr-user@lucene.apache.org
 Subject: Re: integrating Accumulo with solr

 Dear Jack,
 Thank you. I am aware of datastax but I am looking for integrating
 accumulo
 with solr. This is something like what sqrrl guys offer.
 Regards.


 On Thu, Jul 24, 2014 at 7:27 PM, Jack Krupansky j...@basetechnology.com
 wrote:

  If you are not a true hard-core gunslinger who is willing to dive in
 and
 integrate the code yourself, instead you should give serious
 consideration
 to a product such as DataStax Enterprise that fully integrates and
 packages
 a NoSQL database (Cassandra) and Solr for search. The security aspects
 are
 still a work in progress, but certainly headed in the right direction.
 And
 it has Hadoop and Spark integration as well.

 See:
 http://www.datastax.com/what-we-offer/products-services/
 datastax-enterprise

 -- Jack Krupansky

 -Original Message- From: Ali Nazemian
 Sent: Thursday, July 24, 2014 10:30 AM
 To: solr-user@lucene.apache.org
 Subject: Re: integrating Accumulo with solr


 Thank you very much. Nice Idea but how can Solr and Accumulo can be
 synchronized in this way?
 I know that Solr can be integrated with HDFS and also Accumulo works on
 the
 top of HDFS. So can I use HDFS as integration point? I mean set Solr to
 use
 HDFS as a source of documents as well as the destination of documents.
 Regards.


 On Thu, Jul 24, 2014 at 4:33 PM, Joe Gresock jgres...@gmail.com wrote:

  Ali,


 Sounds like a good choice.  It's pretty standard to store the primary
 storage id as a field in Solr so that you can search the full text in
 Solr
 and then retrieve the full document elsewhere.

 I would recommend creating a document structure in Solr with whatever
 fields you want indexed (most likely as text_en, etc.), and then store a
 string field named content_id, which would be the Accumulo row id
 that
 you look up with a scan.

 One caveat -- Accumulo will be protected at the cell level, but if you
 need
 your Solr search results to be protected by complex authorization
 strings
 similar to Accumulo, you will need to write your own QParserPlugin and
 use
 post filtering:
 http://java.dzone.com/articles/custom-security-filtering-solr

 The code you see in that article is written for an earlier version of
 Solr,
 but it's not too difficult to adjust it for the latest (we've done so in
 our project).  Once you've implemented this, you would store an
 authorizations string field in each Solr document, and pass in the
 authorizations that the user has access to in the fq parameter of every
 query.  It's also not too bad to write something that parses the
 Accumulo
 authorizations string (like AB(C|D|E|F)) and interpret it accordingly
 in
 the QParserPlugin.

 This will give you true row level security in Solr and Accumulo, and it
 performs quite well in Solr.

 Let me know if you have any other questions.

 Joe


 On Thu, Jul 24, 2014 at 4:07 AM, Ali Nazemian alinazem...@gmail.com
 wrote:

  Dear Joe,
  Hi,
  I am going to store the crawl web pages in accumulo as the main
 storage
  part of my project and I need to give these data to solr for indexing
 
 and
  user searches. I need to do some social and web analysis on my data as
 well
  as having some security features. Therefore accumulo is my choice for
 
 the
  database part and for index and search I am going to use Solr. Would
  you
  please guide me through that?
 
 
 
  On Thu, Jul 24, 2014 at 1:28 AM, Joe Gresock jgres...@gmail.com
 wrote:
 
   We store data in both Solr and Accumulo -- do you have more details
 about
   what kind of data and indexing you want?  Is there a reason you're
  thinking
   of using both databases in particular?
  
  
   On Wed, Jul 23, 2014 at 5:17 AM, Ali Nazemian 
 alinazem

Re: integrating Accumulo with solr

2014-07-25 Thread Ali Nazemian
Dear Jack,
Actually I am going to do benefit-cost analysis for in-house developement
or going for sqrrl support.
Best regards.


On Thu, Jul 24, 2014 at 11:48 PM, Jack Krupansky j...@basetechnology.com
wrote:

 Like I said, you're going to have to be a real, hard-core gunslinger to do
 that well. Sqrrl uses Lucene directly, BTW:

 Full-Text Search: Utilizing open-source Lucene and custom indexing
 methods, Sqrrl Enterprise users can conduct real-time, full-text search
 across data in Sqrrl Enterprise.

 See:
 http://sqrrl.com/product/search/

 Out of curiosity, why are you not using that integrated Lucene support of
 Sqrrl Enterprise?


 -- Jack Krupansky

 -Original Message- From: Ali Nazemian
 Sent: Thursday, July 24, 2014 3:07 PM

 To: solr-user@lucene.apache.org
 Subject: Re: integrating Accumulo with solr

 Dear Jack,
 Thank you. I am aware of datastax but I am looking for integrating accumulo
 with solr. This is something like what sqrrl guys offer.
 Regards.


 On Thu, Jul 24, 2014 at 7:27 PM, Jack Krupansky j...@basetechnology.com
 wrote:

  If you are not a true hard-core gunslinger who is willing to dive in and
 integrate the code yourself, instead you should give serious consideration
 to a product such as DataStax Enterprise that fully integrates and
 packages
 a NoSQL database (Cassandra) and Solr for search. The security aspects are
 still a work in progress, but certainly headed in the right direction. And
 it has Hadoop and Spark integration as well.

 See:
 http://www.datastax.com/what-we-offer/products-services/
 datastax-enterprise

 -- Jack Krupansky

 -Original Message- From: Ali Nazemian
 Sent: Thursday, July 24, 2014 10:30 AM
 To: solr-user@lucene.apache.org
 Subject: Re: integrating Accumulo with solr


 Thank you very much. Nice Idea but how can Solr and Accumulo can be
 synchronized in this way?
 I know that Solr can be integrated with HDFS and also Accumulo works on
 the
 top of HDFS. So can I use HDFS as integration point? I mean set Solr to
 use
 HDFS as a source of documents as well as the destination of documents.
 Regards.


 On Thu, Jul 24, 2014 at 4:33 PM, Joe Gresock jgres...@gmail.com wrote:

  Ali,


 Sounds like a good choice.  It's pretty standard to store the primary
 storage id as a field in Solr so that you can search the full text in
 Solr
 and then retrieve the full document elsewhere.

 I would recommend creating a document structure in Solr with whatever
 fields you want indexed (most likely as text_en, etc.), and then store a
 string field named content_id, which would be the Accumulo row id
 that
 you look up with a scan.

 One caveat -- Accumulo will be protected at the cell level, but if you
 need
 your Solr search results to be protected by complex authorization strings
 similar to Accumulo, you will need to write your own QParserPlugin and
 use
 post filtering:
 http://java.dzone.com/articles/custom-security-filtering-solr

 The code you see in that article is written for an earlier version of
 Solr,
 but it's not too difficult to adjust it for the latest (we've done so in
 our project).  Once you've implemented this, you would store an
 authorizations string field in each Solr document, and pass in the
 authorizations that the user has access to in the fq parameter of every
 query.  It's also not too bad to write something that parses the Accumulo
 authorizations string (like AB(C|D|E|F)) and interpret it accordingly
 in
 the QParserPlugin.

 This will give you true row level security in Solr and Accumulo, and it
 performs quite well in Solr.

 Let me know if you have any other questions.

 Joe


 On Thu, Jul 24, 2014 at 4:07 AM, Ali Nazemian alinazem...@gmail.com
 wrote:

  Dear Joe,
  Hi,
  I am going to store the crawl web pages in accumulo as the main storage
  part of my project and I need to give these data to solr for indexing 
 and
  user searches. I need to do some social and web analysis on my data as
 well
  as having some security features. Therefore accumulo is my choice for 
 the
  database part and for index and search I am going to use Solr. Would 
 you
  please guide me through that?
 
 
 
  On Thu, Jul 24, 2014 at 1:28 AM, Joe Gresock jgres...@gmail.com
 wrote:
 
   We store data in both Solr and Accumulo -- do you have more details
 about
   what kind of data and indexing you want?  Is there a reason you're
  thinking
   of using both databases in particular?
  
  
   On Wed, Jul 23, 2014 at 5:17 AM, Ali Nazemian alinazem...@gmail.com
 
   wrote:
  
Dear All,
Hi,
I was wondering is there anybody out there that tried to integrate
 Solr
with Accumulo? I was thinking about using Accumulo on top of HDFS 
   and
   using
Solr to index data inside Accumulo? Do you have any idea how can I
do
   such
integration?
   
Best regards.
   
--
A.Nazemian
   
  
  
  
   --
   I know what it is to be in need, and I know what it is to have  
 plenty.
  I
   have learned the secret of being

Re: integrating Accumulo with solr

2014-07-24 Thread Ali Nazemian
Dear Joe,
Hi,
I am going to store the crawl web pages in accumulo as the main storage
part of my project and I need to give these data to solr for indexing and
user searches. I need to do some social and web analysis on my data as well
as having some security features. Therefore accumulo is my choice for the
database part and for index and search I am going to use Solr. Would you
please guide me through that?



On Thu, Jul 24, 2014 at 1:28 AM, Joe Gresock jgres...@gmail.com wrote:

 We store data in both Solr and Accumulo -- do you have more details about
 what kind of data and indexing you want?  Is there a reason you're thinking
 of using both databases in particular?


 On Wed, Jul 23, 2014 at 5:17 AM, Ali Nazemian alinazem...@gmail.com
 wrote:

  Dear All,
  Hi,
  I was wondering is there anybody out there that tried to integrate Solr
  with Accumulo? I was thinking about using Accumulo on top of HDFS and
 using
  Solr to index data inside Accumulo? Do you have any idea how can I do
 such
  integration?
 
  Best regards.
 
  --
  A.Nazemian
 



 --
 I know what it is to be in need, and I know what it is to have plenty.  I
 have learned the secret of being content in any and every situation,
 whether well fed or hungry, whether living in plenty or in want.  I can do
 all this through him who gives me strength.*-Philippians 4:12-13*




-- 
A.Nazemian


Re: integrating Accumulo with solr

2014-07-24 Thread Joe Gresock
Ali,

Sounds like a good choice.  It's pretty standard to store the primary
storage id as a field in Solr so that you can search the full text in Solr
and then retrieve the full document elsewhere.

I would recommend creating a document structure in Solr with whatever
fields you want indexed (most likely as text_en, etc.), and then store a
string field named content_id, which would be the Accumulo row id that
you look up with a scan.

One caveat -- Accumulo will be protected at the cell level, but if you need
your Solr search results to be protected by complex authorization strings
similar to Accumulo, you will need to write your own QParserPlugin and use
post filtering:
http://java.dzone.com/articles/custom-security-filtering-solr

The code you see in that article is written for an earlier version of Solr,
but it's not too difficult to adjust it for the latest (we've done so in
our project).  Once you've implemented this, you would store an
authorizations string field in each Solr document, and pass in the
authorizations that the user has access to in the fq parameter of every
query.  It's also not too bad to write something that parses the Accumulo
authorizations string (like AB(C|D|E|F)) and interpret it accordingly in
the QParserPlugin.

This will give you true row level security in Solr and Accumulo, and it
performs quite well in Solr.

Let me know if you have any other questions.

Joe


On Thu, Jul 24, 2014 at 4:07 AM, Ali Nazemian alinazem...@gmail.com wrote:

 Dear Joe,
 Hi,
 I am going to store the crawl web pages in accumulo as the main storage
 part of my project and I need to give these data to solr for indexing and
 user searches. I need to do some social and web analysis on my data as well
 as having some security features. Therefore accumulo is my choice for the
 database part and for index and search I am going to use Solr. Would you
 please guide me through that?



 On Thu, Jul 24, 2014 at 1:28 AM, Joe Gresock jgres...@gmail.com wrote:

  We store data in both Solr and Accumulo -- do you have more details about
  what kind of data and indexing you want?  Is there a reason you're
 thinking
  of using both databases in particular?
 
 
  On Wed, Jul 23, 2014 at 5:17 AM, Ali Nazemian alinazem...@gmail.com
  wrote:
 
   Dear All,
   Hi,
   I was wondering is there anybody out there that tried to integrate Solr
   with Accumulo? I was thinking about using Accumulo on top of HDFS and
  using
   Solr to index data inside Accumulo? Do you have any idea how can I do
  such
   integration?
  
   Best regards.
  
   --
   A.Nazemian
  
 
 
 
  --
  I know what it is to be in need, and I know what it is to have plenty.  I
  have learned the secret of being content in any and every situation,
  whether well fed or hungry, whether living in plenty or in want.  I can
 do
  all this through him who gives me strength.*-Philippians 4:12-13*
 



 --
 A.Nazemian




-- 
I know what it is to be in need, and I know what it is to have plenty.  I
have learned the secret of being content in any and every situation,
whether well fed or hungry, whether living in plenty or in want.  I can do
all this through him who gives me strength.*-Philippians 4:12-13*


Re: integrating Accumulo with solr

2014-07-24 Thread Ali Nazemian
Thank you very much. Nice Idea but how can Solr and Accumulo can be
synchronized in this way?
I know that Solr can be integrated with HDFS and also Accumulo works on the
top of HDFS. So can I use HDFS as integration point? I mean set Solr to use
HDFS as a source of documents as well as the destination of documents.
Regards.


On Thu, Jul 24, 2014 at 4:33 PM, Joe Gresock jgres...@gmail.com wrote:

 Ali,

 Sounds like a good choice.  It's pretty standard to store the primary
 storage id as a field in Solr so that you can search the full text in Solr
 and then retrieve the full document elsewhere.

 I would recommend creating a document structure in Solr with whatever
 fields you want indexed (most likely as text_en, etc.), and then store a
 string field named content_id, which would be the Accumulo row id that
 you look up with a scan.

 One caveat -- Accumulo will be protected at the cell level, but if you need
 your Solr search results to be protected by complex authorization strings
 similar to Accumulo, you will need to write your own QParserPlugin and use
 post filtering:
 http://java.dzone.com/articles/custom-security-filtering-solr

 The code you see in that article is written for an earlier version of Solr,
 but it's not too difficult to adjust it for the latest (we've done so in
 our project).  Once you've implemented this, you would store an
 authorizations string field in each Solr document, and pass in the
 authorizations that the user has access to in the fq parameter of every
 query.  It's also not too bad to write something that parses the Accumulo
 authorizations string (like AB(C|D|E|F)) and interpret it accordingly in
 the QParserPlugin.

 This will give you true row level security in Solr and Accumulo, and it
 performs quite well in Solr.

 Let me know if you have any other questions.

 Joe


 On Thu, Jul 24, 2014 at 4:07 AM, Ali Nazemian alinazem...@gmail.com
 wrote:

  Dear Joe,
  Hi,
  I am going to store the crawl web pages in accumulo as the main storage
  part of my project and I need to give these data to solr for indexing and
  user searches. I need to do some social and web analysis on my data as
 well
  as having some security features. Therefore accumulo is my choice for the
  database part and for index and search I am going to use Solr. Would you
  please guide me through that?
 
 
 
  On Thu, Jul 24, 2014 at 1:28 AM, Joe Gresock jgres...@gmail.com wrote:
 
   We store data in both Solr and Accumulo -- do you have more details
 about
   what kind of data and indexing you want?  Is there a reason you're
  thinking
   of using both databases in particular?
  
  
   On Wed, Jul 23, 2014 at 5:17 AM, Ali Nazemian alinazem...@gmail.com
   wrote:
  
Dear All,
Hi,
I was wondering is there anybody out there that tried to integrate
 Solr
with Accumulo? I was thinking about using Accumulo on top of HDFS and
   using
Solr to index data inside Accumulo? Do you have any idea how can I do
   such
integration?
   
Best regards.
   
--
A.Nazemian
   
  
  
  
   --
   I know what it is to be in need, and I know what it is to have plenty.
  I
   have learned the secret of being content in any and every situation,
   whether well fed or hungry, whether living in plenty or in want.  I can
  do
   all this through him who gives me strength.*-Philippians 4:12-13*
  
 
 
 
  --
  A.Nazemian
 



 --
 I know what it is to be in need, and I know what it is to have plenty.  I
 have learned the secret of being content in any and every situation,
 whether well fed or hungry, whether living in plenty or in want.  I can do
 all this through him who gives me strength.*-Philippians 4:12-13*




-- 
A.Nazemian


Re: integrating Accumulo with solr

2014-07-24 Thread Jack Krupansky
If you are not a true hard-core gunslinger who is willing to dive in and 
integrate the code yourself, instead you should give serious consideration 
to a product such as DataStax Enterprise that fully integrates and packages 
a NoSQL database (Cassandra) and Solr for search. The security aspects are 
still a work in progress, but certainly headed in the right direction. And 
it has Hadoop and Spark integration as well.


See:
http://www.datastax.com/what-we-offer/products-services/datastax-enterprise

-- Jack Krupansky

-Original Message- 
From: Ali Nazemian

Sent: Thursday, July 24, 2014 10:30 AM
To: solr-user@lucene.apache.org
Subject: Re: integrating Accumulo with solr

Thank you very much. Nice Idea but how can Solr and Accumulo can be
synchronized in this way?
I know that Solr can be integrated with HDFS and also Accumulo works on the
top of HDFS. So can I use HDFS as integration point? I mean set Solr to use
HDFS as a source of documents as well as the destination of documents.
Regards.


On Thu, Jul 24, 2014 at 4:33 PM, Joe Gresock jgres...@gmail.com wrote:


Ali,

Sounds like a good choice.  It's pretty standard to store the primary
storage id as a field in Solr so that you can search the full text in Solr
and then retrieve the full document elsewhere.

I would recommend creating a document structure in Solr with whatever
fields you want indexed (most likely as text_en, etc.), and then store a
string field named content_id, which would be the Accumulo row id that
you look up with a scan.

One caveat -- Accumulo will be protected at the cell level, but if you 
need

your Solr search results to be protected by complex authorization strings
similar to Accumulo, you will need to write your own QParserPlugin and use
post filtering:
http://java.dzone.com/articles/custom-security-filtering-solr

The code you see in that article is written for an earlier version of 
Solr,

but it's not too difficult to adjust it for the latest (we've done so in
our project).  Once you've implemented this, you would store an
authorizations string field in each Solr document, and pass in the
authorizations that the user has access to in the fq parameter of every
query.  It's also not too bad to write something that parses the Accumulo
authorizations string (like AB(C|D|E|F)) and interpret it accordingly in
the QParserPlugin.

This will give you true row level security in Solr and Accumulo, and it
performs quite well in Solr.

Let me know if you have any other questions.

Joe


On Thu, Jul 24, 2014 at 4:07 AM, Ali Nazemian alinazem...@gmail.com
wrote:

 Dear Joe,
 Hi,
 I am going to store the crawl web pages in accumulo as the main storage
 part of my project and I need to give these data to solr for indexing 
 and

 user searches. I need to do some social and web analysis on my data as
well
 as having some security features. Therefore accumulo is my choice for 
 the

 database part and for index and search I am going to use Solr. Would you
 please guide me through that?



 On Thu, Jul 24, 2014 at 1:28 AM, Joe Gresock jgres...@gmail.com wrote:

  We store data in both Solr and Accumulo -- do you have more details
about
  what kind of data and indexing you want?  Is there a reason you're
 thinking
  of using both databases in particular?
 
 
  On Wed, Jul 23, 2014 at 5:17 AM, Ali Nazemian alinazem...@gmail.com
  wrote:
 
   Dear All,
   Hi,
   I was wondering is there anybody out there that tried to integrate
Solr
   with Accumulo? I was thinking about using Accumulo on top of HDFS 
   and

  using
   Solr to index data inside Accumulo? Do you have any idea how can I 
   do

  such
   integration?
  
   Best regards.
  
   --
   A.Nazemian
  
 
 
 
  --
  I know what it is to be in need, and I know what it is to have plenty.
 I
  have learned the secret of being content in any and every situation,
  whether well fed or hungry, whether living in plenty or in want.  I 
  can

 do
  all this through him who gives me strength.*-Philippians 4:12-13*
 



 --
 A.Nazemian




--
I know what it is to be in need, and I know what it is to have plenty.  I
have learned the secret of being content in any and every situation,
whether well fed or hungry, whether living in plenty or in want.  I can do
all this through him who gives me strength.*-Philippians 4:12-13*





--
A.Nazemian 



Re: integrating Accumulo with solr

2014-07-24 Thread Erik Hatcher
Just FYI, the blog Joe mentioned below (authored by me) has been adjusted to 
Solr 4.x in the original blog location here:

   http://searchhub.org/2012/02/22/custom-security-filtering-in-solr/

Erik

On Jul 24, 2014, at 8:03 AM, Joe Gresock jgres...@gmail.com wrote:

 Ali,
 
 Sounds like a good choice.  It's pretty standard to store the primary
 storage id as a field in Solr so that you can search the full text in Solr
 and then retrieve the full document elsewhere.
 
 I would recommend creating a document structure in Solr with whatever
 fields you want indexed (most likely as text_en, etc.), and then store a
 string field named content_id, which would be the Accumulo row id that
 you look up with a scan.
 
 One caveat -- Accumulo will be protected at the cell level, but if you need
 your Solr search results to be protected by complex authorization strings
 similar to Accumulo, you will need to write your own QParserPlugin and use
 post filtering:
 http://java.dzone.com/articles/custom-security-filtering-solr
 
 The code you see in that article is written for an earlier version of Solr,
 but it's not too difficult to adjust it for the latest (we've done so in
 our project).  Once you've implemented this, you would store an
 authorizations string field in each Solr document, and pass in the
 authorizations that the user has access to in the fq parameter of every
 query.  It's also not too bad to write something that parses the Accumulo
 authorizations string (like AB(C|D|E|F)) and interpret it accordingly in
 the QParserPlugin.
 
 This will give you true row level security in Solr and Accumulo, and it
 performs quite well in Solr.
 
 Let me know if you have any other questions.
 
 Joe
 
 
 On Thu, Jul 24, 2014 at 4:07 AM, Ali Nazemian alinazem...@gmail.com wrote:
 
 Dear Joe,
 Hi,
 I am going to store the crawl web pages in accumulo as the main storage
 part of my project and I need to give these data to solr for indexing and
 user searches. I need to do some social and web analysis on my data as well
 as having some security features. Therefore accumulo is my choice for the
 database part and for index and search I am going to use Solr. Would you
 please guide me through that?
 
 
 
 On Thu, Jul 24, 2014 at 1:28 AM, Joe Gresock jgres...@gmail.com wrote:
 
 We store data in both Solr and Accumulo -- do you have more details about
 what kind of data and indexing you want?  Is there a reason you're
 thinking
 of using both databases in particular?
 
 
 On Wed, Jul 23, 2014 at 5:17 AM, Ali Nazemian alinazem...@gmail.com
 wrote:
 
 Dear All,
 Hi,
 I was wondering is there anybody out there that tried to integrate Solr
 with Accumulo? I was thinking about using Accumulo on top of HDFS and
 using
 Solr to index data inside Accumulo? Do you have any idea how can I do
 such
 integration?
 
 Best regards.
 
 --
 A.Nazemian
 
 
 
 
 --
 I know what it is to be in need, and I know what it is to have plenty.  I
 have learned the secret of being content in any and every situation,
 whether well fed or hungry, whether living in plenty or in want.  I can
 do
 all this through him who gives me strength.*-Philippians 4:12-13*
 
 
 
 
 --
 A.Nazemian
 
 
 
 
 -- 
 I know what it is to be in need, and I know what it is to have plenty.  I
 have learned the secret of being content in any and every situation,
 whether well fed or hungry, whether living in plenty or in want.  I can do
 all this through him who gives me strength.*-Philippians 4:12-13*



Re: integrating Accumulo with solr

2014-07-24 Thread Ali Nazemian
Dear Jack,
Thank you. I am aware of datastax but I am looking for integrating accumulo
with solr. This is something like what sqrrl guys offer.
Regards.


On Thu, Jul 24, 2014 at 7:27 PM, Jack Krupansky j...@basetechnology.com
wrote:

 If you are not a true hard-core gunslinger who is willing to dive in and
 integrate the code yourself, instead you should give serious consideration
 to a product such as DataStax Enterprise that fully integrates and packages
 a NoSQL database (Cassandra) and Solr for search. The security aspects are
 still a work in progress, but certainly headed in the right direction. And
 it has Hadoop and Spark integration as well.

 See:
 http://www.datastax.com/what-we-offer/products-services/
 datastax-enterprise

 -- Jack Krupansky

 -Original Message- From: Ali Nazemian
 Sent: Thursday, July 24, 2014 10:30 AM
 To: solr-user@lucene.apache.org
 Subject: Re: integrating Accumulo with solr


 Thank you very much. Nice Idea but how can Solr and Accumulo can be
 synchronized in this way?
 I know that Solr can be integrated with HDFS and also Accumulo works on the
 top of HDFS. So can I use HDFS as integration point? I mean set Solr to use
 HDFS as a source of documents as well as the destination of documents.
 Regards.


 On Thu, Jul 24, 2014 at 4:33 PM, Joe Gresock jgres...@gmail.com wrote:

  Ali,

 Sounds like a good choice.  It's pretty standard to store the primary
 storage id as a field in Solr so that you can search the full text in Solr
 and then retrieve the full document elsewhere.

 I would recommend creating a document structure in Solr with whatever
 fields you want indexed (most likely as text_en, etc.), and then store a
 string field named content_id, which would be the Accumulo row id that
 you look up with a scan.

 One caveat -- Accumulo will be protected at the cell level, but if you
 need
 your Solr search results to be protected by complex authorization strings
 similar to Accumulo, you will need to write your own QParserPlugin and use
 post filtering:
 http://java.dzone.com/articles/custom-security-filtering-solr

 The code you see in that article is written for an earlier version of
 Solr,
 but it's not too difficult to adjust it for the latest (we've done so in
 our project).  Once you've implemented this, you would store an
 authorizations string field in each Solr document, and pass in the
 authorizations that the user has access to in the fq parameter of every
 query.  It's also not too bad to write something that parses the Accumulo
 authorizations string (like AB(C|D|E|F)) and interpret it accordingly in
 the QParserPlugin.

 This will give you true row level security in Solr and Accumulo, and it
 performs quite well in Solr.

 Let me know if you have any other questions.

 Joe


 On Thu, Jul 24, 2014 at 4:07 AM, Ali Nazemian alinazem...@gmail.com
 wrote:

  Dear Joe,
  Hi,
  I am going to store the crawl web pages in accumulo as the main storage
  part of my project and I need to give these data to solr for indexing 
 and
  user searches. I need to do some social and web analysis on my data as
 well
  as having some security features. Therefore accumulo is my choice for 
 the
  database part and for index and search I am going to use Solr. Would you
  please guide me through that?
 
 
 
  On Thu, Jul 24, 2014 at 1:28 AM, Joe Gresock jgres...@gmail.com
 wrote:
 
   We store data in both Solr and Accumulo -- do you have more details
 about
   what kind of data and indexing you want?  Is there a reason you're
  thinking
   of using both databases in particular?
  
  
   On Wed, Jul 23, 2014 at 5:17 AM, Ali Nazemian alinazem...@gmail.com
   wrote:
  
Dear All,
Hi,
I was wondering is there anybody out there that tried to integrate
 Solr
with Accumulo? I was thinking about using Accumulo on top of HDFS 
   and
   using
Solr to index data inside Accumulo? Do you have any idea how can I
do
   such
integration?
   
Best regards.
   
--
A.Nazemian
   
  
  
  
   --
   I know what it is to be in need, and I know what it is to have plenty.
  I
   have learned the secret of being content in any and every situation,
   whether well fed or hungry, whether living in plenty or in want.  I 
  can
  do
   all this through him who gives me strength.*-Philippians 4:12-13*
  
 
 
 
  --
  A.Nazemian
 



 --
 I know what it is to be in need, and I know what it is to have plenty.  I
 have learned the secret of being content in any and every situation,
 whether well fed or hungry, whether living in plenty or in want.  I can do
 all this through him who gives me strength.*-Philippians 4:12-13*




 --
 A.Nazemian




-- 
A.Nazemian


Re: integrating Accumulo with solr

2014-07-24 Thread Jack Krupansky
Like I said, you're going to have to be a real, hard-core gunslinger to do 
that well. Sqrrl uses Lucene directly, BTW:


Full-Text Search: Utilizing open-source Lucene and custom indexing methods, 
Sqrrl Enterprise users can conduct real-time, full-text search across data 
in Sqrrl Enterprise.


See:
http://sqrrl.com/product/search/

Out of curiosity, why are you not using that integrated Lucene support of 
Sqrrl Enterprise?


-- Jack Krupansky

-Original Message- 
From: Ali Nazemian

Sent: Thursday, July 24, 2014 3:07 PM
To: solr-user@lucene.apache.org
Subject: Re: integrating Accumulo with solr

Dear Jack,
Thank you. I am aware of datastax but I am looking for integrating accumulo
with solr. This is something like what sqrrl guys offer.
Regards.


On Thu, Jul 24, 2014 at 7:27 PM, Jack Krupansky j...@basetechnology.com
wrote:


If you are not a true hard-core gunslinger who is willing to dive in and
integrate the code yourself, instead you should give serious consideration
to a product such as DataStax Enterprise that fully integrates and 
packages

a NoSQL database (Cassandra) and Solr for search. The security aspects are
still a work in progress, but certainly headed in the right direction. And
it has Hadoop and Spark integration as well.

See:
http://www.datastax.com/what-we-offer/products-services/
datastax-enterprise

-- Jack Krupansky

-Original Message- From: Ali Nazemian
Sent: Thursday, July 24, 2014 10:30 AM
To: solr-user@lucene.apache.org
Subject: Re: integrating Accumulo with solr


Thank you very much. Nice Idea but how can Solr and Accumulo can be
synchronized in this way?
I know that Solr can be integrated with HDFS and also Accumulo works on 
the
top of HDFS. So can I use HDFS as integration point? I mean set Solr to 
use

HDFS as a source of documents as well as the destination of documents.
Regards.


On Thu, Jul 24, 2014 at 4:33 PM, Joe Gresock jgres...@gmail.com wrote:

 Ali,


Sounds like a good choice.  It's pretty standard to store the primary
storage id as a field in Solr so that you can search the full text in 
Solr

and then retrieve the full document elsewhere.

I would recommend creating a document structure in Solr with whatever
fields you want indexed (most likely as text_en, etc.), and then store a
string field named content_id, which would be the Accumulo row id 
that

you look up with a scan.

One caveat -- Accumulo will be protected at the cell level, but if you
need
your Solr search results to be protected by complex authorization strings
similar to Accumulo, you will need to write your own QParserPlugin and 
use

post filtering:
http://java.dzone.com/articles/custom-security-filtering-solr

The code you see in that article is written for an earlier version of
Solr,
but it's not too difficult to adjust it for the latest (we've done so in
our project).  Once you've implemented this, you would store an
authorizations string field in each Solr document, and pass in the
authorizations that the user has access to in the fq parameter of every
query.  It's also not too bad to write something that parses the Accumulo
authorizations string (like AB(C|D|E|F)) and interpret it accordingly 
in

the QParserPlugin.

This will give you true row level security in Solr and Accumulo, and it
performs quite well in Solr.

Let me know if you have any other questions.

Joe


On Thu, Jul 24, 2014 at 4:07 AM, Ali Nazemian alinazem...@gmail.com
wrote:

 Dear Joe,
 Hi,
 I am going to store the crawl web pages in accumulo as the main storage
 part of my project and I need to give these data to solr for indexing 
and
 user searches. I need to do some social and web analysis on my data as
well
 as having some security features. Therefore accumulo is my choice for 
the
 database part and for index and search I am going to use Solr. Would 
 you

 please guide me through that?



 On Thu, Jul 24, 2014 at 1:28 AM, Joe Gresock jgres...@gmail.com
wrote:

  We store data in both Solr and Accumulo -- do you have more details
about
  what kind of data and indexing you want?  Is there a reason you're
 thinking
  of using both databases in particular?
 
 
  On Wed, Jul 23, 2014 at 5:17 AM, Ali Nazemian alinazem...@gmail.com
  wrote:
 
   Dear All,
   Hi,
   I was wondering is there anybody out there that tried to integrate
Solr
   with Accumulo? I was thinking about using Accumulo on top of HDFS 
  and
  using
   Solr to index data inside Accumulo? Do you have any idea how can I
   do
  such
   integration?
  
   Best regards.
  
   --
   A.Nazemian
  
 
 
 
  --
  I know what it is to be in need, and I know what it is to have 
  plenty.

 I
  have learned the secret of being content in any and every situation,
  whether well fed or hungry, whether living in plenty or in want.  I 
 can
 do
  all this through him who gives me strength.*-Philippians 4:12-13*
 



 --
 A.Nazemian




--
I know what it is to be in need, and I know what it is to have plenty.  I
have learned the secret of being

Re: integrating Accumulo with solr

2014-07-23 Thread Joe Gresock
We store data in both Solr and Accumulo -- do you have more details about
what kind of data and indexing you want?  Is there a reason you're thinking
of using both databases in particular?


On Wed, Jul 23, 2014 at 5:17 AM, Ali Nazemian alinazem...@gmail.com wrote:

 Dear All,
 Hi,
 I was wondering is there anybody out there that tried to integrate Solr
 with Accumulo? I was thinking about using Accumulo on top of HDFS and using
 Solr to index data inside Accumulo? Do you have any idea how can I do such
 integration?

 Best regards.

 --
 A.Nazemian




-- 
I know what it is to be in need, and I know what it is to have plenty.  I
have learned the secret of being content in any and every situation,
whether well fed or hungry, whether living in plenty or in want.  I can do
all this through him who gives me strength.*-Philippians 4:12-13*