Seems like a transient mvn repo problem. Can you try again? Wolfgang.
On Jul 23, 2013, at 1:36 AM, Flavio Pompermaier wrote: > Still problems when building CDK Data Core Module 0.4.2-SNAPSHOT. Maven hangs > at: > > Downloading: > https://repository.cloudera.com/artifactory/cloudera-repos/com/twitter/parquet-avro/1.0.0-SNAPSHOT/maven-metadata.xml > Downloading: > https://oss.sonatype.org/content/repositories/snapshots/com/twitter/parquet-avro/1.0.0-SNAPSHOT/maven-metadata.xml > lug 23, 2013 10:35:41 AM org.apache.commons.httpclient.HttpMethodDirector > executeWithRetry > INFO: I/O exception (java.net.ConnectException) caught when processing > request: Connessione scaduta > lug 23, 2013 10:35:41 AM org.apache.commons.httpclient.HttpMethodDirector > executeWithRetry > INFO: I/O exception (java.net.ConnectException) caught when processing > request: Connessione scaduta > lug 23, 2013 10:35:41 AM org.apache.commons.httpclient.HttpMethodDirector > executeWithRetry > INFO: Retrying request > lug 23, 2013 10:35:41 AM org.apache.commons.httpclient.HttpMethodDirector > executeWithRetry > INFO: Retrying request > > > > On Tue, Jul 23, 2013 at 10:33 AM, Flavio Pompermaier <[email protected]> > wrote: > Sorry, this is caused of our mirror..I remove it and I'll retry.. > > > On Tue, Jul 23, 2013 at 10:31 AM, Flavio Pompermaier <[email protected]> > wrote: > > I still get this error: > > Failed to read artifact descriptor for > commons-daemon:commons-daemon:jar:1.0.3: Could not transfer artifact > commons-daemon:commons-daemon:pom:1.0.3 from/to repo > (http://dev.okkam.it/artifactory/repo): Failed to transfer file: > http://dev.okkam.it/artifactory/repo/commons-daemon/commons-daemon/1.0.3/commons-daemon-1.0.3.pom. > Return code is: 409 -> [Help 1] > > > On Tue, Jul 23, 2013 at 10:22 AM, Wolfgang Hoschek <[email protected]> > wrote: > Tests pass on java 6 but fail on java 7. Correspondingly, I have filed > https://issues.cloudera.org/browse/CDK-80. We'll fix it. Meanwhile, please > try java 6. > > Wolfgang. > > On Jul 23, 2013, at 12:51 AM, Flavio Pompermaier wrote: > > > I tried to download the current trunk but it doesn't compile..for example > > it hangs on > > https://repository.cloudera.com/artifactory/cloudera-repos/com/twitter/parquet-avro/1.0.0-SNAPSHOT/maven-metadata.xml > > that doesn't exists anymore.. > > > > > > On Mon, Jul 22, 2013 at 11:14 PM, Flavio Pompermaier <[email protected]> > > wrote: > > You couldn't be more precise ;) > > > > Thanks, > > Flavio > > > > On Mon, Jul 22, 2013 at 11:02 PM, Wolfgang Hoschek <[email protected]> > > wrote: > > Docs for the xquery and xslt morphline commands are here (look for > > xquery"): > > https://github.com/cloudera/cdk/blob/master/cdk-morphlines/src/site/confluence/morphlinesReferenceGuide.confluence > > > > Example morphlines for the new xquery and xslt commands are here: > > https://github.com/cloudera/cdk/tree/master/cdk-morphlines/cdk-morphlines-saxon/src/test/resources/test-morphlines > > > > Sample input data is here: > > https://github.com/cloudera/cdk/tree/master/cdk-morphlines/cdk-morphlines-saxon/src/test/resources/test-documents > > > > Unit tests are here: > > https://github.com/cloudera/cdk/blob/master/cdk-morphlines/cdk-morphlines-saxon/src/test/java/com/cloudera/cdk/morphline/saxon/SaxonMorphlineTest.java > > > > Wolfgang. > > > > On Jul 22, 2013, at 1:41 PM, Flavio Pompermaier wrote: > > > > > Ok, I'll try to follow the code! Just one last thing: for morphine-neon I > > > manage to find the test (in cdk repository) but for the new xslt and > > > xquery I'm not able to find the tests code..could you give me an hook? > > > > > > On Mon, Jul 22, 2013 at 9:21 PM, Wolfgang Hoschek <[email protected]> > > > wrote: > > > There are many tests for this in the morphlines repo. > > > > > > Wolfgang. > > > > > > On Jul 22, 2013, at 11:43 AM, Flavio Pompermaiert wrote: > > > > > > > > > > > Thank you for the great support Wolfgang! > > > > Flume + Morphlines is undoubtedly an exciting road but its taking me > > > > too much time :( > > > > Do you think you could add some more tests including readJson and the > > > > new xquery and xslt in trunk? > > > > > > > > Best, > > > > Flavio > > > > On Mon, Jul 22, 2013 at 8:12 PM, Wolfgang Hoschek > > > > <[email protected]> wrote: > > > > Looks like the DcXMLParser spits out a metadata field called "title" > > > > and another title as part of the Tika XML stream. That metadata field > > > > is then added to the solr document by solrcell. If you add "title" to > > > > the captures the title from the XML stream gets added as well by > > > > solrcell. > > > > > > > > JSON support has been released in morphlines-0.4.1 (which flume trunk > > > > is now depending on): > > > > http://cloudera.github.io/cdk/docs/0.4.1/cdk-morphlines/morphlinesReferenceGuide.html#readJson > > > > > > > > Note that Tika XML doesn't really support/capture XPath extraction with > > > > SolrCell. We have added proper support for reading, extracting and > > > > transforming XML and HTML with XPath, XQuery and XSLT on the current > > > > morphlines trunk (not yet released), similar to the way we already > > > > support JSON and Avro. This should make XML handling a lot more > > > > straightforward, and make the very limited XML SolrCell approach > > > > obsolete. Look for the new "xquery" and "xslt" command in > > > > https://github.com/cloudera/cdk/blob/master/cdk-morphlines/src/site/confluence/morphlinesReferenceGuide.confluence > > > > > > > > Meanwhile, consider using these new commands or, use JSON or Avro, or > > > > write your own custom morphline commands that extract whatever you want > > > > from your XML data. > > > > > > > > Wolfgang. > > > > > > > > On Jul 22, 2013, at 9:18 AM, Flavio Pompermaier wrote: > > > > > > > > > Hi to all, > > > > > I'm trying to understand how to "master" Morphline configuration > > > > > files in order to put some data into Solr but I'm facing some problem > > > > > with TestMorphlineSolrSink. This is what I done: > > > > > > > > > > 1) Since I want to index the title of the testXML.xml (i.e. "Tika > > > > > test document") so I commented out all the parsers except > > > > > org.apache.tika.parser.xml.DcXMLParser (which parse Doublin Core > > > > > metadata) > > > > > 2) In schema.xml I added the following field: > > > > > <field name="title" type="text_en" indexed="true" stored="true" > > > > > multiValued="false" /> > > > > > > > > > > But: > > > > > - If I don't add anything to fmap or capture everything works fine > > > > > but I don't understand why (who fills that field?). If instead I add > > > > > to capture title or/and to famp title: title (or dc_title:title) Solr > > > > > complains that 2 values are retrieved for 'title' (debugging the > > > > > values I see the title and one empty value in the 'title\ metadata > > > > > array...). > > > > > Thus, the problem is that everything works magically if the field is > > > > > named title, but if I change its name to something like doc_title > > > > > there's no way to make it non-multivalued. Am I right? How can I fix > > > > > this problem? > > > > > - I'd like to manage JSON files..How can I map JSON fields to Solr > > > > > fields? Could someone give a simple example? > > > > > > > > > > Best, > > > > > Flavio > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > Flavio Pompermaier > Development Department > _______________________________________________ > OKKAMSrl - www.okkam.it > > Phone: +(39) 0461 283 702 > Fax: + (39) 0461 186 6433 > Email: [email protected] > Headquarters: Trento (Italy), fraz. Villazzano, Salita dei Molini 2 > Registered office: Trento (Italy), via Segantini 23 > > Confidentially notice. This e-mail transmission may contain legally > privileged and/or confidential information. Please do not read it if you are > not the intended recipient(S). Any use, distribution, reproduction or > disclosure by any other person is strictly prohibited. If you have received > this e-mail in error, please notify the sender and destroy the original > transmission and its attachments without reading or saving it in any manner. > > > > >
