This is an automated email from the ASF dual-hosted git repository.

aromanenko pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git


The following commit(s) were added to refs/heads/master by this push:
     new a9dcf95453d [CdapIO] Add readme for CdapIO. Update readme for 
SparkReceiverIO. (#23959)
a9dcf95453d is described below

commit a9dcf95453d43621ef3ce63782f7cd6c47399bb6
Author: Vitaly Terentyev <vitaly.terent...@akvelon.com>
AuthorDate: Mon Dec 5 14:45:31 2022 +0400

    [CdapIO] Add readme for CdapIO. Update readme for SparkReceiverIO. (#23959)
    
    * Add README for CdapIO. Update README for SparkReceiverIO.
    
    * Set export javadoc true for cdap and sparkreceiver
    
    * Updates to CdapIO and SparkReceiverIO readmes
    
    * Add manual how to add support for Batch and Streaming Cdap plugins.
    
    * Add manual how to add support for Spark Receiver.
    
    * Fix whitespace
    
    * updates to CDAP and SparkReceiver readme files
    
    * Updated CDAP readme
    
    * Fix links in readme
    
    Co-authored-by: Alex Kosolapov <alex.kosola...@gmail.com>
---
 sdks/java/io/cdap/README.md               | 145 ++++++++++++++++++++++++++++++
 sdks/java/io/cdap/build.gradle            |   1 -
 sdks/java/io/sparkreceiver/2/README.md    |  40 +++++++--
 sdks/java/io/sparkreceiver/2/build.gradle |   1 -
 4 files changed, 180 insertions(+), 7 deletions(-)

diff --git a/sdks/java/io/cdap/README.md b/sdks/java/io/cdap/README.md
new file mode 100644
index 00000000000..5269204c1d1
--- /dev/null
+++ b/sdks/java/io/cdap/README.md
@@ -0,0 +1,145 @@
+<!--
+    Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+    Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+-->
+
+# CdapIO
+CdapIO provides I/O transforms for [CDAP](https://cdap.io/) plugins.
+
+## What is CDAP?
+
+[CDAP](https://cdap.io/) is an application platform for building and managing 
data applications in hybrid and multi-cloud environments.
+It enables developers, business analysts, and data scientists to use a visual 
rapid development environment and utilize common patterns,
+data, and application abstractions to accelerate the development of data 
applications, addressing a broader range of real-time and batch use cases.
+
+[CDAP plugins](https://github.com/data-integrations) types:
+- Batch source
+- Batch sink
+- Streaming source
+
+To learn more about CDAP plugins please see 
[io.cdap.cdap.api.annotation.Plugin](https://javadoc.io/static/io.cdap.cdap/cdap-api/6.7.2/io/cdap/cdap/api/annotation/Plugin.html)
 and [Data Integrations](https://github.com/data-integrations) plugins 
repository.
+
+## CDAP Batch plugins support in CDAP IO
+
+CdapIO supports CDAP Batch plugins based on Hadoop 
[InputFormat](https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/InputFormat.html)
 and 
[OutputFormat](https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/OutputFormat.html).
+CDAP batch plugins support is implemented using 
[HadoopFormatIO](https://beam.apache.org/documentation/io/built-in/hadoop/).
+
+CdapIO currently supports the following CDAP Batch plugins by referencing 
`CDAP plugin` class:
+* [Hubspot Batch 
Source](https://github.com/data-integrations/hubspot/blob/develop/src/main/java/io/cdap/plugin/hubspot/source/batch/HubspotBatchSource.java)
+* [Hubspot Batch 
Sink](https://github.com/data-integrations/hubspot/blob/develop/src/main/java/io/cdap/plugin/hubspot/sink/batch/HubspotBatchSink.java)
+* [Salesforce Batch 
Source](https://github.com/data-integrations/salesforce/blob/develop/src/main/java/io/cdap/plugin/salesforce/plugin/source/batch/SalesforceBatchSource.java)
+* [Salesforce Batch 
Sink](https://github.com/data-integrations/salesforce/blob/develop/src/main/java/io/cdap/plugin/salesforce/plugin/sink/batch/SalesforceBatchSink.java)
+* [ServiceNow Batch 
Source](https://github.com/data-integrations/servicenow-plugins/blob/develop/src/main/java/io/cdap/plugin/servicenow/source/ServiceNowSource.java)
+* [Zendesk Batch 
Source](https://github.com/data-integrations/zendesk/blob/develop/src/main/java/io/cdap/plugin/zendesk/source/batch/ZendeskBatchSource.java)
+
+It means that all these plugins can be used like this:
+``CdapIO.withCdapPluginClass(HubspotBatchSource.class)``
+
+### Requirements for Cdap Batch plugins
+
+CDAP Batch plugin should be based on `HadoopFormat` implementation.
+
+### How to add support for a new CDAP Batch plugin
+
+To add CdapIO support for a new CDAP Batch 
[Plugin](src/main/java/org/apache/beam/sdk/io/cdap/Plugin.java) perform the 
following steps:
+1. Find CDAP plugin artifacts in the Maven Central repository. *Example:* 
[Hubspot plugin Maven 
repository](https://mvnrepository.com/artifact/io.cdap/hubspot-plugins/1.0.0). 
*Note:* To add a custom CDAP plugin, please follow [Sonatype publishing 
guidelines](https://central.sonatype.org/publish/).
+2. Add the CDAP plugin Maven dependency to the `build.gradle` file. *Example:* 
``implementation "io.cdap:hubspot-plugins:1.0.0"``.
+3. Here are two ways of using CDAP batch plugin with CdapIO:
+   1. Using `Plugin.createBatch()` method. Pass Cdap Plugin class and correct 
`InputFormat` (or `OutputFormat`) and `InputFormatProvider` (or 
`OutputFormatProvider`) classes to CdapIO. *Example:*
+   ```
+   CdapIO.withCdapPlugin(
+      Plugin.createBatch(
+      EmployeeBatchSource.class,
+      EmployeeInputFormat.class,
+      EmployeeInputFormatProvider.class));
+   ```
+   2. Using `MappingUtils`.
+      1. Navigate to 
[MappingUtils](src/main/java/org/apache/beam/sdk/io/cdap/MappingUtils.java) 
class.
+      2. Modify `getPluginClassByName()` method:
+      3. Add the code for mapping Cdap Plugin class name and `Input/Output 
Format` and `FormatProvider` classes.
+      *Example:*
+      ```
+      if (pluginClass.equals(EmployeeBatchSource.class)){
+         return Plugin.createBatch(pluginClass,
+                       EmployeeInputFormat.class,
+                       EmployeeInputFormatProvider.class);
+      }
+      ```
+      4. After these steps you will be able to use Cdap Plugin by class name 
like this: ``CdapIO.withCdapPluginClass(EmployeeBatchSource.class)``
+
+To learn more, please check out [complete 
examples](https://github.com/apache/beam/tree/master/examples/java/cdap/src/main/java/org/apache/beam/examples/complete/cdap).
+
+## CDAP Streaming plugins support in CDAP IO
+
+CdapIO supports CDAP Streaming plugins based on [Apache Spark 
Receiver](https://spark.apache.org/docs/2.4.0/streaming-custom-receivers.html).
+CDAP streaming plugins support is implemented using 
[SparkReceiverIO](https://github.com/apache/beam/tree/master/sdks/java/io/sparkreceiver).
+
+### Requirements for Cdap Streaming plugins
+
+1. CDAP Streaming plugin should be based on `Spark Receiver`.
+2. CDAP Streaming plugin should support work with offsets.
+   1. Corresponding Spark Receiver should implement 
[HasOffset](https://github.com/apache/beam/blob/master/sdks/java/io/sparkreceiver/src/main/java/org/apache/beam/sdk/io/sparkreceiver/HasOffset.java)
 interface.
+   2. Records should have the numeric field that represents record offset. 
*Example:* `RecordId` field for Salesforce and `vid` field for Hubspot plugins.
+   For more details please see 
[GetOffsetUtils](https://github.com/apache/beam/tree/master/examples/java/cdap/src/main/java/org/apache/beam/examples/complete/cdap/utils/GetOffsetUtils.java)
 class from examples.
+
+### How to add support for a new CDAP Streaming plugin
+
+To add CdapIO support for a new CDAP Streaming SparkReceiver 
[Plugin](src/main/java/org/apache/beam/sdk/io/cdap/Plugin.java), perform the 
following steps:
+1. Find CDAP plugin artifacts in the Maven Central repository. *Example:* 
[Hubspot plugin Maven 
repository](https://mvnrepository.com/artifact/io.cdap/hubspot-plugins/1.0.0). 
*Note:* To add a custom CDAP plugin, please follow [Sonatype publishing 
guidelines](https://central.sonatype.org/publish/).
+2. Add CDAP plugin Maven dependency to the `build.gradle` file. *Example:* 
``implementation "io.cdap:hubspot-plugins:1.0.0"``.
+3. Implement function that will define how to get `Long offset` from the 
record of the Cdap Plugin.
+*Example:* see 
[GetOffsetUtils](https://github.com/apache/beam/tree/master/examples/java/cdap/src/main/java/org/apache/beam/examples/complete/cdap/utils/GetOffsetUtils.java)
 class from examples.
+4. Here are two ways of using Cdap streaming Plugin with CdapIO:
+    1. Using `Plugin.createStreaming()` method. Pass Cdap Plugin class, 
correct `getOffsetFn` (from step 3) and Spark `Receiver` class to CdapIO. 
*Example:*
+   ```
+   CdapIO.withCdapPlugin(
+      Plugin.createStreaming(
+      HubspotStreamingSource.class,
+      offsetFnForHubspot,
+      HubspotReceiver.class)));
+   ```
+    2. Using `MappingUtils`.
+        1. Navigate to 
[MappingUtils](src/main/java/org/apache/beam/sdk/io/cdap/MappingUtils.java) 
class.
+        2. Modify `getPluginClassByName()` method:
+        3. Add the code for mapping Cdap Plugin class name, `getOffsetFn` 
function and Spark `Receiver` class.
+           *Example:*
+       ```
+       if (pluginClass.equals(HubspotStreamingSource.class)){
+          return Plugin.createStreaming(pluginClass,
+                        getOffsetFnForHubpot(),
+                        HubspotReceiverClass.class);
+       }
+       ```
+        4. After these steps you will be able to use Cdap Plugin by class name 
like this: ``CdapIO.withCdapPluginClass(HubspotStreamingSource.class)``
+
+To learn more, please check out [complete 
examples](https://github.com/apache/beam/tree/master/examples/java/cdap/src/main/java/org/apache/beam/examples/complete/cdap).
+
+## Dependencies
+
+To use CdapIO please add a dependency on `beam-sdks-java-io-cdap`.
+
+```maven
+<dependency>
+    <groupId>org.apache.beam</groupId>
+    <artifactId>beam-sdks-java-io-cdap</artifactId>
+    <version>...</version>
+</dependency>
+```
+
+## Documentation
+
+The documentation and usage examples are maintained in JavaDoc for 
[CdapIO.java](src/main/java/org/apache/beam/sdk/io/cdap/CdapIO.java).
diff --git a/sdks/java/io/cdap/build.gradle b/sdks/java/io/cdap/build.gradle
index 52edc91078a..2274325ceef 100644
--- a/sdks/java/io/cdap/build.gradle
+++ b/sdks/java/io/cdap/build.gradle
@@ -22,7 +22,6 @@ plugins {
 }
 
 applyJavaNature(
-        exportJavadoc: false,
         automaticModuleName: 'org.apache.beam.sdk.io.cdap',
 )
 provideIntegrationTestingDependencies()
diff --git a/sdks/java/io/sparkreceiver/2/README.md 
b/sdks/java/io/sparkreceiver/2/README.md
index 6ce48efd58f..035456f30fd 100644
--- a/sdks/java/io/sparkreceiver/2/README.md
+++ b/sdks/java/io/sparkreceiver/2/README.md
@@ -16,12 +16,44 @@
     specific language governing permissions and limitations
     under the License.
 -->
+# SparkReceiverIO
 
-SparkReceiverIO contains I/O transforms which allow you to read messages from 
Spark Receiver (org.apache.spark.streaming.receiver.Receiver).
+SparkReceiverIO provides I/O transforms to read messages from an [Apache Spark 
Receiver](https://spark.apache.org/docs/2.4.0/streaming-custom-receivers.html) 
`org.apache.spark.streaming.receiver.Receiver` as an unbounded source.
+
+## Prerequistes
+
+SparkReceiverIO supports [Spark 
Receivers](https://spark.apache.org/docs/2.4.0/streaming-custom-receivers.html) 
(Spark version 2.4).
+1. Corresponding Spark Receiver should implement 
[HasOffset](https://github.com/apache/beam/blob/master/sdks/java/io/sparkreceiver/src/main/java/org/apache/beam/sdk/io/sparkreceiver/HasOffset.java)
 interface.
+2. Records should have the numeric field that represents record offset. 
*Example:* `RecordId` field for Salesforce and `vid` field for Hubspot 
Receivers.
+   For more details please see 
[GetOffsetUtils](https://github.com/apache/beam/tree/master/examples/java/cdap/src/main/java/org/apache/beam/examples/complete/cdap/utils/GetOffsetUtils.java)
 class from CDAP plugins examples.
+
+## Adding support for a new Spark Receiver
+
+To add SparkReceiverIO support for a new Spark `Receiver`, perform the 
following steps:
+1. Add Spark Receiver to the Maven Central repository (see [Sonatype 
publishing guidelines](https://central.sonatype.org/publish/)). *Example:* 
[Hubspot CDAP plugin Maven 
repository](https://mvnrepository.com/artifact/io.cdap/hubspot-plugins/1.0.0).
+2. Add Spark Receiver Maven dependency to the `build.gradle` file. *Example:* 
``implementation "io.cdap:hubspot-plugins:1.0.0"``.
+3. Implement function that will define how to get `Long offset` from the 
record of the Spark Receiver.
+   *Example:* see 
[GetOffsetUtils](https://github.com/apache/beam/tree/master/examples/java/cdap/src/main/java/org/apache/beam/examples/complete/cdap/utils/GetOffsetUtils.java)
 class from CDAP plugins examples.
+4. Construct `ReceiverBuilder` object by passing class of record that you want 
to read (e.g. String) and your Spark `Receiver` class name (dependency from 
step 2). *Example:*
+   ```
+      ReceiverBuilder<String, HubspotReceiver> receiverBuilder =
+      new ReceiverBuilder<>(HubspotReceiver.class).withConstructorArgs();
+   ```
+5. Use your Spark `Receiver` with SparkReceiverIO:
+    1. Pass correct `getOffsetFn` (from step 3) and correct `ReceiverBuilder` 
(from step 4). *Example:*
+   ```
+   SparkReceiverIO.Read<V> reader =
+            SparkReceiverIO.<V>read()
+                .withGetOffsetFn(getOffsetFn)
+                .withSparkReceiverBuilder(receiverBuilder);
+   ```
+
+
+To learn more, please check out CDAP Streaming plugins [complete 
examples](https://github.com/apache/beam/tree/master/examples/java/cdap/src/main/java/org/apache/beam/examples/complete/cdap)
 where Spark Receivers are used.
 
 ## Dependencies
 
-To use SparkReceiverIO you must first add a dependency on 
`beam-sdks-java-io-sparkreceiver`.
+To use SparkReceiverIO, add a dependency on `beam-sdks-java-io-sparkreceiver`.
 
 ```maven
 <dependency>
@@ -33,6 +65,4 @@ To use SparkReceiverIO you must first add a dependency on 
`beam-sdks-java-io-spa
 
 ## Documentation
 
-The documentation is maintained in JavaDoc for SparkReceiverIO class. It 
includes
-usage examples and primary concepts.
-- 
[SparkReceiverIO.java](src/main/java/org/apache/beam/sdk/io/sparkreceiver/SparkReceiverIO.java)
+The documentation and usage examples are maintained in JavaDoc for 
[SparkReceiverIO 
class](src/main/java/org/apache/beam/sdk/io/sparkreceiver/SparkReceiverIO.java).
diff --git a/sdks/java/io/sparkreceiver/2/build.gradle 
b/sdks/java/io/sparkreceiver/2/build.gradle
index 7607127adca..b039b9ce204 100644
--- a/sdks/java/io/sparkreceiver/2/build.gradle
+++ b/sdks/java/io/sparkreceiver/2/build.gradle
@@ -22,7 +22,6 @@ plugins {
 }
 
 applyJavaNature(
-        exportJavadoc: false,
         automaticModuleName: 'org.apache.beam.sdk.io.sparkreceiver',
 )
 provideIntegrationTestingDependencies()

Reply via email to