RE: Transition from OODT 0.6 to 0.12 cannot find extractor specifications

2016-04-06 Thread Konstantinos Mavrommatis
:54 PM, Konstantinos Mavrommatis < kmavromma...@celgene.com> wrote: > Hi, > It seems to be happening for a number of types of files that I have in > the mimetypes.xml. > A few things are puzzling to me: this file which is a .gz file is not > processed by the regular tika mim

RE: Transition from OODT 0.6 to 0.12 cannot find extractor specifications

2016-04-04 Thread Konstantinos Mavrommatis
l 2, 2016, Konstantinos Mavrommatis < kmavromma...@celgene.com> wrote: > Hi, > I am trying to replicate a fully functional service that I had setup > long time ago using OODT 0.6 but I am having the following problem > that does not allow me to ingest files. When

Transition from OODT 0.6 to 0.12 cannot find extractor specifications

2016-04-02 Thread Konstantinos Mavrommatis
Hi, I am trying to replicate a fully functional service that I had setup long time ago using OODT 0.6 but I am having the following problem that does not allow me to ingest files. When I try to ingest files with the extension fastq.gz I get the line: WARNING: No extractor specs specified for

RE: Setting up filemanager with SOLR 5.5

2016-03-24 Thread Konstantinos Mavrommatis
@oodt.apache.org Subject: Re: Setting up filemanager with SOLR 5.5 Hi Kos, The schema field definition in the file manager schema sound like they need a bit of an overhaul. Are you able to file an issue against master and submit a PR? Thanks Lewis On Thursday, March 24, 2016, Konstantinos Mavrommatis

RE: Setting up filemanager with SOLR 5.5

2016-03-24 Thread Konstantinos Mavrommatis
t; > Tom > > On Thu, Mar 24, 2016 at 7:15 AM, Konstantinos Mavrommatis < > kmavromma...@celgene.com> wrote: > >> Hi, >> I am using oodt v 0.12 >> Interestingly the file etc/logging.properties was not in its place >> although this was a clean installa

RE: Setting up filemanager with SOLR 5.5

2016-03-24 Thread Konstantinos Mavrommatis
ID as oppose the opaque object... the later is relatively useless. On Wed, Mar 23, 2016 at 6:48 PM, Konstantinos Mavrommatis < kmavromma...@celgene.com> wrote: > Hi, > I have setup oodt using RADiX. > > When I use the default Lucene catalog factory I manage to ingest a >

Setting up filemanager with SOLR 5.5

2016-03-23 Thread Konstantinos Mavrommatis
Hi, I have setup oodt using RADiX. When I use the default Lucene catalog factory I manage to ingest a file with no problem: # ./filemgr-client --url http://localhost:9000 --operation --ingestProduct --productName test.txt --productStructure Flat --productTypeName GenericFile --metadataFile

RE: Problem installing oodt 0.12.

2016-03-02 Thread Konstantinos Mavrommatis
. Give maven 3 a whirl and see if you get any further. > > Tom > On 2 Mar 2016 08:02, "Konstantinos Mavrommatis" > <kmavromma...@celgene.com> > wrote: > >> Thanks, >> I tried the -DskipTests but

RE: Problem installing oodt 0.12.

2016-03-02 Thread Konstantinos Mavrommatis
o ports and things because from time to time it does cause > issues. > > For now I'd suggest you run -DskipTests as they've been run and passed > prior to release. Not ideal but will unblock you. > > I'll try and get that test resolved properly, soon. > > Cheers, &

Problem installing oodt 0.12.

2016-03-01 Thread Konstantinos Mavrommatis
Hi, I am trying to install the latest oodt 0.12 from the src.zip file. The installation is on a clean Ubuntu 14.04 with maven2.2.1 and Oracle java JDK 1.8.0_74 After unzipping the archive I run mvn clean install and I get the following error: $ more

How to ingest files when metadata contain non standard characters?

2014-10-08 Thread Konstantinos Mavrommatis
Hi, I am trying to ingest a large number of files. The metadata for these files exist in .met files. Many of the metadata fields contain characters like '$' etc. Running crawler on these metadata results in failure. When I try to escape the characters using HTML encode e.g. '' becomes gt etc

RE: How to ingest files when metadata contain non standard characters?

2014-10-08 Thread Konstantinos Mavrommatis
@oodt.apache.org Subject: Re: How to ingest files when metadata contain non standard characters? Hi Kos, I take you up on your challenge ;) However I don't know if this will fix it. On Tue, Oct 7, 2014 at 11:31 PM, Konstantinos Mavrommatis kmavromma...@celgene.commailto:kmavromma

RE: How to ingest files when metadata contain non standard characters?

2014-10-08 Thread Konstantinos Mavrommatis
files when metadata contain non standard characters? Hi Kos, Thanks for reply On Wed, Oct 8, 2014 at 5:16 PM, Konstantinos Mavrommatis kmavromma...@celgene.com wrote: I escaped the characters using the CGI::escapeHTML function from the CGI perl module. Wow. I am surpised

RE: How to ingest files when metadata contain non standard characters?

2014-10-08 Thread Konstantinos Mavrommatis
Here is the offending file before escape: cas:metadata xmlns:cas=http://oodt.jpl.nasa.gov/1.0/cas; keyval keyderived_from/key val/gpfs/celgene/reference/v1/Homo-sapiens/GRCh37.p12/SailFishIndex/val

UPDATE-PROBABLY SOLVED: File manager keeps connections to SOLR in CLOSE_WAIT state for hours

2014-07-10 Thread Konstantinos Mavrommatis
. Lewis [0] https://issues.apache.org/jira/browse/SOLR-3280 On Tue, Jul 8, 2014 at 7:14 AM, Konstantinos Mavrommatis kmavromma...@celgene.commailto:kmavromma...@celgene.com wrote: Hi, I have setup OODT filemanager on port 9000, using SOLR as the indexing service

RE: UPDATE-PROBABLY SOLVED: File manager keeps connections to SOLR in CLOSE_WAIT state for hours

2014-07-10 Thread Konstantinos Mavrommatis
, Paul Ramirez On Jul 10, 2014, at 5:23 AM, Konstantinos Mavrommatis kmavromma...@celgene.com wrote: Hi, Following the suggestion at http://blogs.nuxeo.com/development/2013/02/using-httpclient-properly- a void-closewait-tcp-connections/ I modified the code in src/main

Re: How to process files in a sorted order

2013-11-15 Thread Konstantinos Mavrommatis
, but I don't believe there is a way to accomplish your goal of enforcing a sorting algorithm within the crawler config. I think you will have to write your own crawler that will implement your sorting logic. Sincerely, Cameron Goodale On Thu, Nov 7, 2013 at 7:44 PM, Konstantinos Mavrommatis

How to process files in a sorted order

2013-11-07 Thread Konstantinos Mavrommatis
Hi, In my environment I am using cas-crawler to process directories of 1000s of files. The metadata for these files are extracted automatically using the mimetypes definitions and small wrapper scripts. In these directories some of the files are derived from other files and metadata from the